GNU bug report logs -
#28779
tests/workers.scm failure
Previous Next
Reported by: Eric Bavier <bavier <at> cray.com>
Date: Tue, 10 Oct 2017 15:50:02 UTC
Severity: normal
Done: ludo <at> gnu.org (Ludovic Courtès)
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28779 in the body.
You can then email your comments to 28779 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#28779
; Package
guix
.
(Tue, 10 Oct 2017 15:50:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Eric Bavier <bavier <at> cray.com>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Tue, 10 Oct 2017 15:50:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Roughly 1 in 2 runs of tests/workers.scm fails on my system. Output:
========================================================
GNU Guix 0.13.0.3413-984e3-dirty: ./test-suite.log
========================================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: tests/workers
===================
test-name: enqueue
location: /home/users/bavier/src/guix/tests/workers.scm:26
source:
+ (test-equal
+ "enqueue"
+ 4242
+ (let* ((pool (make-pool))
+ (result 0)
+ (#{1+!}# (let ((lock (make-mutex)))
+ (lambda ()
+ (with-mutex lock (set! result (+ result 1)))))))
+ (let loop ((i 4242))
+ (unless
+ (zero? i)
+ (pool-enqueue! pool #{1+!}#)
+ (loop (- i 1))))
+ (let poll ()
+ (unless
+ (pool-idle? pool)
+ (pk 'busy result)
+ (sleep 1)
+ (poll)))
+ result))
expected-value: 4242
actual-value: 4241
result: FAIL
To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Most failures show "actual-value: 4241", but I have also seen "actual-value: 4239" and "actual-value: 4240", which points to a race condition.
On this system '(current-processor-count) => 128'
Eric Bavier, Scientific Libraries, Cray Inc.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#28779
; Package
guix
.
(Thu, 16 Nov 2017 08:30:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 28779 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Eric,
Eric Bavier <bavier <at> cray.com> skribis:
> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> + "enqueue"
> + 4242
> + (let* ((pool (make-pool))
> + (result 0)
> + (#{1+!}# (let ((lock (make-mutex)))
> + (lambda ()
> + (with-mutex lock (set! result (+ result 1)))))))
> + (let loop ((i 4242))
> + (unless
> + (zero? i)
> + (pool-enqueue! pool #{1+!}#)
> + (loop (- i 1))))
> + (let poll ()
> + (unless
> + (pool-idle? pool)
> + (pk 'busy result)
> + (sleep 1)
> + (poll)))
> + result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Indeed, good catch.
The attached patch is a bit crude but it should fix the problem.
Thoughts?
Thanks,
Ludo’.
[Message part 2 (text/x-patch, inline)]
diff --git a/guix/workers.scm b/guix/workers.scm
index 846f5e50a..0f6f54bab 100644
--- a/guix/workers.scm
+++ b/guix/workers.scm
@@ -45,12 +45,13 @@
;;; Code:
(define-record-type <pool>
- (%make-pool queue mutex condvar workers)
+ (%make-pool queue mutex condvar workers busy)
pool?
(queue pool-queue)
(mutex pool-mutex)
(condvar pool-condition-variable)
- (workers pool-workers))
+ (workers pool-workers)
+ (busy pool-busy))
(define-syntax-rule (without-mutex mutex exp ...)
(dynamic-wind
@@ -62,12 +63,14 @@
(lock-mutex mutex))))
(define* (worker-thunk mutex condvar pop-queue
- #:key (thread-name "guix worker"))
+ #:key idle busy (thread-name "guix worker"))
"Return the thunk executed by worker threads."
(define (loop)
(match (pop-queue)
(#f ;empty queue
- (wait-condition-variable condvar mutex))
+ (idle)
+ (wait-condition-variable condvar mutex)
+ (busy))
((? procedure? proc)
;; Release MUTEX while executing PROC.
(without-mutex mutex
@@ -97,19 +100,24 @@ threads as reported by the operating system."
(let* ((mutex (make-mutex))
(condvar (make-condition-variable))
(queue (make-q))
+ (busy count)
(procs (unfold (cut >= <> count)
(lambda (n)
(worker-thunk mutex condvar
(lambda ()
(and (not (q-empty? queue))
(q-pop! queue)))
+ #:busy (lambda ()
+ (set! busy (+ 1 busy)))
+ #:idle (lambda ()
+ (set! busy (- busy 1)))
#:thread-name thread-name))
1+
0))
(threads (map (lambda (proc)
(call-with-new-thread proc))
procs)))
- (%make-pool queue mutex condvar threads)))
+ (%make-pool queue mutex condvar threads (lambda () busy))))
(define (pool-enqueue! pool thunk)
"Enqueue THUNK for future execution by POOL."
@@ -118,9 +126,11 @@ threads as reported by the operating system."
(signal-condition-variable (pool-condition-variable pool))))
(define (pool-idle? pool)
- "Return true if POOL doesn't have any task in its queue."
+ "Return true if POOL doesn't have any task in its queue and all the workers
+are currently idle (i.e., waiting for a task)."
(with-mutex (pool-mutex pool)
- (q-empty? (pool-queue pool))))
+ (and (q-empty? (pool-queue pool))
+ (zero? ((pool-busy pool))))))
(define-syntax-rule (eventually pool exp ...)
"Run EXP eventually on one of the workers of POOL."
Information forwarded
to
bug-guix <at> gnu.org
:
bug#28779
; Package
guix
.
(Fri, 17 Nov 2017 02:59:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 28779 <at> debbugs.gnu.org (full text, mbox):
Looks good to me.
Thanks,
Eric Bavier, Scientific Libraries, Cray Inc.
________________________________________
From: Ludovic Courtès <ludo <at> gnu.org>
Sent: Thursday, November 16, 2017 02:29
To: Eric Bavier
Cc: 28779 <at> debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure
Hi Eric,
Eric Bavier <bavier <at> cray.com> skribis:
> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> + "enqueue"
> + 4242
> + (let* ((pool (make-pool))
> + (result 0)
> + (#{1+!}# (let ((lock (make-mutex)))
> + (lambda ()
> + (with-mutex lock (set! result (+ result 1)))))))
> + (let loop ((i 4242))
> + (unless
> + (zero? i)
> + (pool-enqueue! pool #{1+!}#)
> + (loop (- i 1))))
> + (let poll ()
> + (unless
> + (pool-idle? pool)
> + (pk 'busy result)
> + (sleep 1)
> + (poll)))
> + result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Indeed, good catch.
The attached patch is a bit crude but it should fix the problem.
Thoughts?
Thanks,
Ludo’.
Reply sent
to
ludo <at> gnu.org (Ludovic Courtès)
:
You have taken responsibility.
(Fri, 17 Nov 2017 10:11:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Eric Bavier <bavier <at> cray.com>
:
bug acknowledged by developer.
(Fri, 17 Nov 2017 10:11:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 28779-done <at> debbugs.gnu.org (full text, mbox):
Eric Bavier <bavier <at> cray.com> skribis:
> Looks good to me.
Pushed as 232b3d31016439b5600e47d845ffb7c9a4ee4723.
Thanks,
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 15 Dec 2017 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 127 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.