GNU bug report logs - #28779
tests/workers.scm failure

Previous Next

Package: guix;

Reported by: Eric Bavier <bavier <at> cray.com>

Date: Tue, 10 Oct 2017 15:50:02 UTC

Severity: normal

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28779 in the body.
You can then email your comments to 28779 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#28779; Package guix. (Tue, 10 Oct 2017 15:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Bavier <bavier <at> cray.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 10 Oct 2017 15:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Bavier <bavier <at> cray.com>
To: "bug-guix <at> gnu.org" <bug-guix <at> gnu.org>
Subject: tests/workers.scm failure
Date: Tue, 10 Oct 2017 15:48:42 +0000
Roughly 1 in 2 runs of tests/workers.scm fails on my system.  Output:

========================================================
   GNU Guix 0.13.0.3413-984e3-dirty: ./test-suite.log
========================================================

# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/workers
===================

test-name: enqueue
location: /home/users/bavier/src/guix/tests/workers.scm:26
source:
+ (test-equal
+   "enqueue"
+   4242
+   (let* ((pool (make-pool))
+          (result 0)
+          (#{1+!}# (let ((lock (make-mutex)))
+                     (lambda ()
+                       (with-mutex lock (set! result (+ result 1)))))))
+     (let loop ((i 4242))
+       (unless
+         (zero? i)
+         (pool-enqueue! pool #{1+!}#)
+         (loop (- i 1))))
+     (let poll ()
+       (unless
+         (pool-idle? pool)
+         (pk 'busy result)
+         (sleep 1)
+         (poll)))
+     result))
expected-value: 4242
actual-value: 4241
result: FAIL


To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.  

Most failures show "actual-value: 4241", but I have also seen "actual-value: 4239" and "actual-value: 4240", which points to a race condition.

On this system '(current-processor-count) => 128'

Eric Bavier, Scientific Libraries, Cray Inc.




Information forwarded to bug-guix <at> gnu.org:
bug#28779; Package guix. (Thu, 16 Nov 2017 08:30:02 GMT) Full text and rfc822 format available.

Message #8 received at 28779 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Eric Bavier <bavier <at> cray.com>
Cc: 28779 <at> debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure
Date: Thu, 16 Nov 2017 09:29:38 +0100
[Message part 1 (text/plain, inline)]
Hi Eric,

Eric Bavier <bavier <at> cray.com> skribis:

> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> +   "enqueue"
> +   4242
> +   (let* ((pool (make-pool))
> +          (result 0)
> +          (#{1+!}# (let ((lock (make-mutex)))
> +                     (lambda ()
> +                       (with-mutex lock (set! result (+ result 1)))))))
> +     (let loop ((i 4242))
> +       (unless
> +         (zero? i)
> +         (pool-enqueue! pool #{1+!}#)
> +         (loop (- i 1))))
> +     (let poll ()
> +       (unless
> +         (pool-idle? pool)
> +         (pk 'busy result)
> +         (sleep 1)
> +         (poll)))
> +     result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.  

Indeed, good catch.

The attached patch is a bit crude but it should fix the problem.

Thoughts?

Thanks,
Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/workers.scm b/guix/workers.scm
index 846f5e50a..0f6f54bab 100644
--- a/guix/workers.scm
+++ b/guix/workers.scm
@@ -45,12 +45,13 @@
 ;;; Code:
 
 (define-record-type <pool>
-  (%make-pool queue mutex condvar workers)
+  (%make-pool queue mutex condvar workers busy)
   pool?
   (queue    pool-queue)
   (mutex    pool-mutex)
   (condvar  pool-condition-variable)
-  (workers  pool-workers))
+  (workers  pool-workers)
+  (busy     pool-busy))
 
 (define-syntax-rule (without-mutex mutex exp ...)
   (dynamic-wind
@@ -62,12 +63,14 @@
       (lock-mutex mutex))))
 
 (define* (worker-thunk mutex condvar pop-queue
-                       #:key (thread-name "guix worker"))
+                       #:key idle busy (thread-name "guix worker"))
   "Return the thunk executed by worker threads."
   (define (loop)
     (match (pop-queue)
       (#f                                         ;empty queue
-       (wait-condition-variable condvar mutex))
+       (idle)
+       (wait-condition-variable condvar mutex)
+       (busy))
       ((? procedure? proc)
        ;; Release MUTEX while executing PROC.
        (without-mutex mutex
@@ -97,19 +100,24 @@ threads as reported by the operating system."
   (let* ((mutex   (make-mutex))
          (condvar (make-condition-variable))
          (queue   (make-q))
+         (busy    count)
          (procs   (unfold (cut >= <> count)
                           (lambda (n)
                             (worker-thunk mutex condvar
                                           (lambda ()
                                             (and (not (q-empty? queue))
                                                  (q-pop! queue)))
+                                          #:busy (lambda ()
+                                                   (set! busy (+ 1 busy)))
+                                          #:idle (lambda ()
+                                                   (set! busy (- busy 1)))
                                           #:thread-name thread-name))
                           1+
                           0))
          (threads (map (lambda (proc)
                          (call-with-new-thread proc))
                        procs)))
-    (%make-pool queue mutex condvar threads)))
+    (%make-pool queue mutex condvar threads (lambda () busy))))
 
 (define (pool-enqueue! pool thunk)
   "Enqueue THUNK for future execution by POOL."
@@ -118,9 +126,11 @@ threads as reported by the operating system."
     (signal-condition-variable (pool-condition-variable pool))))
 
 (define (pool-idle? pool)
-  "Return true if POOL doesn't have any task in its queue."
+  "Return true if POOL doesn't have any task in its queue and all the workers
+are currently idle (i.e., waiting for a task)."
   (with-mutex (pool-mutex pool)
-    (q-empty? (pool-queue pool))))
+    (and (q-empty? (pool-queue pool))
+         (zero? ((pool-busy pool))))))
 
 (define-syntax-rule (eventually pool exp ...)
   "Run EXP eventually on one of the workers of POOL."

Information forwarded to bug-guix <at> gnu.org:
bug#28779; Package guix. (Fri, 17 Nov 2017 02:59:03 GMT) Full text and rfc822 format available.

Message #11 received at 28779 <at> debbugs.gnu.org (full text, mbox):

From: Eric Bavier <bavier <at> cray.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: "28779 <at> debbugs.gnu.org" <28779 <at> debbugs.gnu.org>
Subject: Re: bug#28779: tests/workers.scm failure
Date: Fri, 17 Nov 2017 02:58:12 +0000
Looks good to me.

Thanks,

Eric Bavier, Scientific Libraries, Cray Inc.

________________________________________
From: Ludovic Courtès <ludo <at> gnu.org>
Sent: Thursday, November 16, 2017 02:29
To: Eric Bavier
Cc: 28779 <at> debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure

Hi Eric,

Eric Bavier <bavier <at> cray.com> skribis:

> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> +   "enqueue"
> +   4242
> +   (let* ((pool (make-pool))
> +          (result 0)
> +          (#{1+!}# (let ((lock (make-mutex)))
> +                     (lambda ()
> +                       (with-mutex lock (set! result (+ result 1)))))))
> +     (let loop ((i 4242))
> +       (unless
> +         (zero? i)
> +         (pool-enqueue! pool #{1+!}#)
> +         (loop (- i 1))))
> +     (let poll ()
> +       (unless
> +         (pool-idle? pool)
> +         (pk 'busy result)
> +         (sleep 1)
> +         (poll)))
> +     result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.

Indeed, good catch.

The attached patch is a bit crude but it should fix the problem.

Thoughts?

Thanks,
Ludo’.





Reply sent to ludo <at> gnu.org (Ludovic Courtès):
You have taken responsibility. (Fri, 17 Nov 2017 10:11:01 GMT) Full text and rfc822 format available.

Notification sent to Eric Bavier <bavier <at> cray.com>:
bug acknowledged by developer. (Fri, 17 Nov 2017 10:11:02 GMT) Full text and rfc822 format available.

Message #16 received at 28779-done <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Eric Bavier <bavier <at> cray.com>
Cc: 28779-done <at> debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure
Date: Fri, 17 Nov 2017 11:10:18 +0100
Eric Bavier <bavier <at> cray.com> skribis:

> Looks good to me.

Pushed as 232b3d31016439b5600e47d845ffb7c9a4ee4723.

Thanks,
Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 15 Dec 2017 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.