Package: guile;
Reported by: Михаил Бахтерев <mob <at> k.imm.uran.ru>
Date: Thu, 20 Sep 2018 18:08:01 UTC
Severity: normal
Done: Mark H Weaver <mhw <at> netris.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32786 in the body.
You can then email your comments to 32786 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Thu, 20 Sep 2018 18:08:02 GMT) Full text and rfc822 format available.Михаил Бахтерев <mob <at> k.imm.uran.ru>
:bug-guile <at> gnu.org
.
(Thu, 20 Sep 2018 18:08:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Михаил Бахтерев <mob <at> k.imm.uran.ru> To: bug-guile <at> gnu.org Subject: (system* ...) call somehow interferes with atomic-box on armv7l Date: Thu, 20 Sep 2018 22:13:06 +0500
[Message part 1 (text/plain, inline)]
Greetings. I attach the code sample, which expected behaviour is: 1. to periodically restart thread, which executes «sleep 1s» with system* call; 2. and then to check if it should repeat the execution. The flag, which controls that re-execution is managed by another thread through atomic-box primitive. The code does not run as expected: the sleep executing sleep starts once, and change seems to remain invisible to main thread. The code works as expected, when: 1. (sleep 1) is used instead (system* "sleep" "1s"); 2. running the code on x86_64 cpus. The use of affinity mask taskset 0x01 guile test.scm does not help. - MB. Respectfully P.S. Additional information: $ guile --version guile (GNU Guile) 2.2.4 Copyright (C) 2018 Free Software Foundation, Inc. License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. $ sh config.guess armv7l-unknown-linux-gnueabihf $ pacman -Qi guile Name : guile Version : 2.2.4-1 Description : Portable, embeddable Scheme implementation written in C Architecture : armv7h URL : https://www.gnu.org/software/guile/ Licenses : GPL Groups : None Provides : None Depends On : gmp libltdl ncurses texinfo libunistring gc libffi Optional Deps : None Required By : make Optional For : gnutls Conflicts With : None Replaces : None Installed Size : 40.83 MiB Packager : Arch Linux ARM Build System <builder+xu9 <at> archlinuxarm.org> Build Date : Tue 03 Jul 2018 04:47:53 PM +05 Install Date : Mon 09 Jul 2018 01:02:48 PM +05 Install Reason : Installed as a dependency for another package Install Script : No Validated By : Signature
[test.scm (application/vnd.lotus-screencam, attachment)]
[cpuinfo (text/plain, attachment)]
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Fri, 21 Sep 2018 21:27:01 GMT) Full text and rfc822 format available.Message #8 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Mark H Weaver <mhw <at> netris.org> To: Михаил Бахтерев <mob <at> k.imm.uran.ru> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Fri, 21 Sep 2018 17:26:27 -0400
Hi, Михаил Бахтерев <mob <at> k.imm.uran.ru> writes: > I attach the code sample, which expected behaviour is: > > 1. to periodically restart thread, which executes «sleep 1s» with > system* call; > > 2. and then to check if it should repeat the execution. > > The flag, which controls that re-execution is managed by another thread > through atomic-box primitive. > > The code does not run as expected: the sleep executing sleep starts > once, and change seems to remain invisible to main thread. Can you describe in more detail the behavior you are seeing? What messages do you see on stderr from your 'dump' calls when it gets stuck? What state is the atomic box left in? I see a couple of problems with the code below: * Two threads write concurrently to the same port, namely (current-error-port). You should use thread synchronization to ensure that operations to a given port are serialized. * There's a race condition when the atomic box is in the #:accepted state. If the main thread finds the box in that state, it changes the state to #:need-to-sleep, which will apparently cause the other thread to sleep again. I'm not sure if these problems could explain what you're seeing. Regards, Mark > The code works as expected, when: > > 1. (sleep 1) is used instead (system* "sleep" "1s"); > 2. running the code on x86_64 cpus. > > The use of affinity mask > > taskset 0x01 guile test.scm > > does not help. > > - MB. Respectfully > > P.S. Additional information: > > $ guile --version > guile (GNU Guile) 2.2.4 > Copyright (C) 2018 Free Software Foundation, Inc. > > License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>. > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > $ sh config.guess > armv7l-unknown-linux-gnueabihf > > $ pacman -Qi guile > Name : guile > Version : 2.2.4-1 > Description : Portable, embeddable Scheme implementation written in C > Architecture : armv7h > URL : https://www.gnu.org/software/guile/ > Licenses : GPL > Groups : None > Provides : None > Depends On : gmp libltdl ncurses texinfo libunistring gc libffi > Optional Deps : None > Required By : make > Optional For : gnutls > Conflicts With : None > Replaces : None > Installed Size : 40.83 MiB > Packager : Arch Linux ARM Build System <builder+xu9 <at> archlinuxarm.org> > Build Date : Tue 03 Jul 2018 04:47:53 PM +05 > Install Date : Mon 09 Jul 2018 01:02:48 PM +05 > Install Reason : Installed as a dependency for another package > Install Script : No > Validated By : Signature > > (use-modules (ice-9 atomic) > (ice-9 threads)) > > (define dump > (let ((p (current-error-port))) > (lambda (fmt . args) (apply format p fmt args)))) > > (define state (make-atomic-box #:nothing-to-do)) > > (define (expect e v) > (when (not (eq? e v)) > (error "Protocol violation; (expecting . got):" (cons e v)))) > > (define (sleep-loop) > (expect #:need-to-sleep (atomic-box-ref state)) > > (dump "R: Going to (sleep 1)~%") > (atomic-box-set! state #:accepted) > > ; SOMETHING WRONG IS HERE > (system* "sleep" "1s") > ; (sleep 1) > > (let ((v (atomic-box-compare-and-swap! state #:accepted #:nothing-to-do))) > (when (not (eq? #:accepted v)) > (dump "S: Repeating sleep~%") > (sleep-loop))) > > (dump "R: sleep-loop finished~%")) > > (define (main-loop) > (define (run-thread) > (dump "M: new sleep thread~%") > (atomic-box-set! state #:need-to-sleep) > (call-with-new-thread sleep-loop)) > > (dump "M: (sleep 3)~%") > (sleep 3) > > (dump "M: protocol exchange~%") > (let ((st (atomic-box-ref state))) > (case st > ((#:nothing-to-do) (run-thread)) > ((#:accepted) (let ((v (atomic-box-compare-and-swap! state #:accepted #:need-to-sleep))) > (when (not (eq? #:accepted v)) > (expect #:accepted v) > (run-thread)))) > (else (expect #:need-to-sleep st)))) > > (main-loop)) > > (main-loop)
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Sat, 22 Sep 2018 05:06:01 GMT) Full text and rfc822 format available.Message #11 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Михаил Бахтерев <mob <at> k.imm.uran.ru> To: Mark H Weaver <mhw <at> netris.org> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Sat, 22 Sep 2018 09:39:38 +0500
On Fri, Sep 21, 2018 at 05:26:27PM -0400, Mark H Weaver wrote: > Hi, > > Can you describe in more detail the behavior you are seeing? What > messages do you see on stderr from your 'dump' calls when it gets stuck? > What state is the atomic box left in? > > I see a couple of problems with the code below: > > * Two threads write concurrently to the same port, namely > (current-error-port). You should use thread synchronization to ensure > that operations to a given port are serialized. > > * There's a race condition when the atomic box is in the #:accepted > state. If the main thread finds the box in that state, it changes the > state to #:need-to-sleep, which will apparently cause the other thread > to sleep again. > > I'm not sure if these problems could explain what you're seeing. > > Regards, > Mark > The code is supposed to model the following behaviour. The main thread tracks some events (in the example: the end of (sleep 3)). If one or multiple events occurred some long-lasting external command should be executed. Long-lasting means that multiple events may occur during execution. To perform that the main thread should: 1. either to run new work thread (sleep-loop procedure), which will call (system* ...); 2. or to communicate the fact of events occurrences to the work thread, which is supposed to detect this and just repeat the command. If multiple events have occurred during command execution the command should be re-executed only once. The event occurrence is communicated through simple protocol with three states: #:nothing-to-do < #:accepted < #:need-to-sleep. The main thread pulls state up, and work thread pulls it down. When the main thread detects event, and the state is #:accepted, it means, that work thread accepted the previous job, and executing it now, so the main thread pulls the state up to need-to-sleep, and the work thread re-executes command (sleeps again in the test case), when the previous external command run has finished. This is intended behaviour. I expect the program to print some garbage, showing that work thread is restarted periodically. Something like that (it is result of using (sleep 1) instead of (system* "sleep" "1s")): $ guile test.scm M: (sleep 3) M: protocol exchange M: new sleep thread M: (sleep 3) R: Going to (sleep 1) R: sleep-loop finished M: protocol exchange M: new sleep thread M: (sleep 3) : (sleep 3)R: Going to (sleep 1) R: sleep-loop finished M: protocol exchange M: new sleep thread M: (sleep 3) R: Going to (sleep 1) R: sleep-loop finished M: protocol exchange M: new sleep thread M: (sleep 3) : (sleep 3)R: Going to (sleep 1) R: sleep-loop finished But what i get with (system* "sleep" "1s") is this: $ guile test.scm M: (sleep 3) M: protocol exchange M: new sleep thread M: (sleep 3) R: Going to (sleep 1) R: sleep-loop finished M: protocol exchange M: (sleep 3) M: protocol exchange M: (sleep 3) M: protocol exchange M: (sleep 3) So, the state of the atomic-box is stuck in #:need-to-sleep somehow. - MB. Respectfully
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Sat, 22 Sep 2018 05:06:02 GMT) Full text and rfc822 format available.Message #14 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Михаил Бахтерев <mob <at> k.imm.uran.ru> To: Mark H Weaver <mhw <at> netris.org> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Sat, 22 Sep 2018 09:50:09 +0500
By the way, if i change (system* "sleep" "1s") to the (system* "true"), nothing changes. The log of execution is the same: $ guile test.scm M: (sleep 3) M: protocol exchange M: new sleep thread M: (sleep 3) R: Going to (sleep 1) R: sleep-loop finished M: protocol exchange M: (sleep 3) M: protocol exchange M: (sleep 3) But 3 seconds are more than enough to finish the work thread and pull the state down to the #:nothing-to-do, which should cause the main thread re-run work thread. - MB. Respectfully On Sat, Sep 22, 2018 at 09:39:38AM +0500, Михаил Бахтерев wrote: > On Fri, Sep 21, 2018 at 05:26:27PM -0400, Mark H Weaver wrote: > > Hi, > > > > Can you describe in more detail the behavior you are seeing? What > > messages do you see on stderr from your 'dump' calls when it gets stuck? > > What state is the atomic box left in? > > > > I see a couple of problems with the code below: > > > > * Two threads write concurrently to the same port, namely > > (current-error-port). You should use thread synchronization to ensure > > that operations to a given port are serialized. > > > > * There's a race condition when the atomic box is in the #:accepted > > state. If the main thread finds the box in that state, it changes the > > state to #:need-to-sleep, which will apparently cause the other thread > > to sleep again. > > > > I'm not sure if these problems could explain what you're seeing. > > > > Regards, > > Mark > > > > The code is supposed to model the following behaviour. > > The main thread tracks some events (in the example: the end of (sleep > 3)). If one or multiple events occurred some long-lasting external > command should be executed. Long-lasting means that multiple events may > occur during execution. > > To perform that the main thread should: > > 1. either to run new work thread (sleep-loop procedure), which will > call (system* ...); > > 2. or to communicate the fact of events occurrences to the work > thread, which is supposed to detect this and just repeat the command. > > If multiple events have occurred during command execution the command > should be re-executed only once. > > The event occurrence is communicated through simple protocol with three > states: #:nothing-to-do < #:accepted < #:need-to-sleep. The main thread > pulls state up, and work thread pulls it down. When the main thread > detects event, and the state is #:accepted, it means, that work thread > accepted the previous job, and executing it now, so the main thread > pulls the state up to need-to-sleep, and the work thread re-executes > command (sleeps again in the test case), when the previous external > command run has finished. This is intended behaviour. > > I expect the program to print some garbage, showing that work thread is > restarted periodically. Something like that (it is result of using > (sleep 1) instead of (system* "sleep" "1s")): > > $ guile test.scm > M: (sleep 3) > M: protocol exchange > M: new sleep thread > M: (sleep 3) > > R: Going to (sleep 1) > R: sleep-loop finished > M: protocol exchange > M: new sleep thread > M: (sleep 3) > : (sleep 3)R: Going to (sleep 1) > > R: sleep-loop finished > M: protocol exchange > M: new sleep thread > M: (sleep 3) > > R: Going to (sleep 1) > R: sleep-loop finished > M: protocol exchange > M: new sleep thread > M: (sleep 3) > > : (sleep 3)R: Going to (sleep 1) > R: sleep-loop finished > > But what i get with (system* "sleep" "1s") is this: > > $ guile test.scm > M: (sleep 3) > M: protocol exchange > M: new sleep thread > M: (sleep 3) > R: Going to (sleep 1) > R: sleep-loop finished > M: protocol exchange > M: (sleep 3) > M: protocol exchange > M: (sleep 3) > M: protocol exchange > M: (sleep 3) > > So, the state of the atomic-box is stuck in #:need-to-sleep somehow. > > - MB. Respectfully
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Thu, 27 Sep 2018 05:50:02 GMT) Full text and rfc822 format available.Message #17 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Mark H Weaver <mhw <at> netris.org> To: Михаил Бахтерев <mob <at> k.imm.uran.ru> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Thu, 27 Sep 2018 01:49:23 -0400
[Message part 1 (text/plain, inline)]
Hi, Thanks for the additional details. I was able to reproduce the bug, and I believe I now see the problem. 'atomic-box-compare-and-swap!' is implemented using 'atomic_compare_exchange_weak' (if available), but neglects to handle the case where 'atomic_compare_exchange_weak' may spuriously fail. In that case, the box is left unchanged, although the observed value was equal to the expected value. What's happening here is that the 'atomic-box-compare-and-swap!' in 'sleep-loop' fails spuriously, leaving the box in state #:accepted although it returns #:accepted as its result. When the main loop discovers this, it changes the state to #:need-to-sleep, although the thread has already ended. To confirm this hypothesis, I added a print statement to the main loop showing the state of the box that it observed during the protocol exchange, and indeed it sees #:accepted the first time it checks, and #:need-to-sleep in all later iterations. I've attached a proposed patch that I hope will fix this problem. If you'd be willing to test it, I'd be grateful, but otherwise I'll try to test it in the next week or so. My access to armv7l boxes is somewhat limited. Thanks for this report. Mark
[0001-UNTESTED-Fix-atomic-box-compare-and-swap.patch (text/x-patch, inline)]
From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001 From: Mark H Weaver <mhw <at> netris.org> Date: Thu, 27 Sep 2018 01:00:11 -0400 Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'. Fixes <https://bugs.gnu.org/32786>. 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if it's available on the host platform. 'atomic_compare_exchange_weak' may fail spuriously, leaving the atomic object unchanged even when it contained the expected value. 'atomic-box-compare-and-swap!' uses 'scm_atomic_compare_and_swap_scm' without checking its return value, and therefore may return the expected value while leaving the box unchanged. This contradicts the documentation, which explicitly states that "you can know if the swap worked by checking if the return value is 'eq?' to EXPECTED.". * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If 'scm_atomic_compare_and_swap_scm' returns false and the observed value is equal to 'expected', then try again. * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto. --- libguile/atomic.c | 13 +++++++++---- libguile/vm-engine.c | 13 ++++++++----- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/libguile/atomic.c b/libguile/atomic.c index 950874030..504643912 100644 --- a/libguile/atomic.c +++ b/libguile/atomic.c @@ -1,4 +1,4 @@ -/* Copyright (C) 2016 Free Software Foundation, Inc. +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public License @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x, "if the return value is @code{eq?} to @var{expected}.") #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x { + SCM result = expected; + SCM_VALIDATE_ATOMIC_BOX (1, box); - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), - &expected, desired); - return expected; + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), + &result, desired) + && scm_is_eq (result, expected)) + { + } + return result; } #undef FUNC_NAME diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c index 19ff3e498..9650e33eb 100644 --- a/libguile/vm-engine.c +++ b/libguile/vm-engine.c @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp, { scm_t_uint16 dst, box; scm_t_uint32 expected, desired; - SCM scm_box, scm_expected; + SCM scm_box, scm_expected, scm_result; UNPACK_12_12 (op, dst, box); UNPACK_24 (ip[1], expected); UNPACK_24 (ip[2], desired); scm_box = SP_REF (box); VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!"); - scm_expected = SP_REF (expected); - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), - &scm_expected, SP_REF (desired)); - SP_SET (dst, scm_expected); + scm_result = scm_expected = SP_REF (expected); + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), + &scm_result, SP_REF (desired)) + && scm_is_eq (scm_result, scm_expected)) + { + } + SP_SET (dst, scm_result); NEXT (3); } -- 2.19.0
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Thu, 27 Sep 2018 06:19:02 GMT) Full text and rfc822 format available.Message #20 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Михаил Бахтерев <mob <at> k.imm.uran.ru> To: Mark H Weaver <mhw <at> netris.org> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Thu, 27 Sep 2018 11:18:16 +0500
Thanks for clarification! I'll be able to test the patch in a couple of days. - MB. Respectfully On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote: > Hi, > > Thanks for the additional details. I was able to reproduce the bug, and > I believe I now see the problem. > > 'atomic-box-compare-and-swap!' is implemented using > 'atomic_compare_exchange_weak' (if available), but neglects to handle > the case where 'atomic_compare_exchange_weak' may spuriously fail. In > that case, the box is left unchanged, although the observed value was > equal to the expected value. > > What's happening here is that the 'atomic-box-compare-and-swap!' in > 'sleep-loop' fails spuriously, leaving the box in state #:accepted > although it returns #:accepted as its result. When the main loop > discovers this, it changes the state to #:need-to-sleep, although the > thread has already ended. > > To confirm this hypothesis, I added a print statement to the main loop > showing the state of the box that it observed during the protocol > exchange, and indeed it sees #:accepted the first time it checks, and > #:need-to-sleep in all later iterations. > > I've attached a proposed patch that I hope will fix this problem. If > you'd be willing to test it, I'd be grateful, but otherwise I'll try to > test it in the next week or so. My access to armv7l boxes is somewhat > limited. > > Thanks for this report. > > Mark > >
bug-guile <at> gnu.org
:bug#32786
; Package guile
.
(Fri, 05 Oct 2018 19:45:02 GMT) Full text and rfc822 format available.Message #23 received at 32786 <at> debbugs.gnu.org (full text, mbox):
From: Михаил Бахтерев <mob <at> k.imm.uran.ru> To: Mark H Weaver <mhw <at> netris.org> Cc: 32786 <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Sat, 6 Oct 2018 00:43:57 +0500
Hi. I am sorry for delay. I've tested the patch (i've applied it to the v2.2.4 source code). The test sample works now as expected with both test «payloads»: (sleep N-seconds) and (system* "any" "command" "here") calls. Thanks for fixing it. - MB. Respectfully On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote: > Hi, > > Thanks for the additional details. I was able to reproduce the bug, and > I believe I now see the problem. > > 'atomic-box-compare-and-swap!' is implemented using > 'atomic_compare_exchange_weak' (if available), but neglects to handle > the case where 'atomic_compare_exchange_weak' may spuriously fail. In > that case, the box is left unchanged, although the observed value was > equal to the expected value. > > What's happening here is that the 'atomic-box-compare-and-swap!' in > 'sleep-loop' fails spuriously, leaving the box in state #:accepted > although it returns #:accepted as its result. When the main loop > discovers this, it changes the state to #:need-to-sleep, although the > thread has already ended. > > To confirm this hypothesis, I added a print statement to the main loop > showing the state of the box that it observed during the protocol > exchange, and indeed it sees #:accepted the first time it checks, and > #:need-to-sleep in all later iterations. > > I've attached a proposed patch that I hope will fix this problem. If > you'd be willing to test it, I'd be grateful, but otherwise I'll try to > test it in the next week or so. My access to armv7l boxes is somewhat > limited. > > Thanks for this report. > > Mark > > > From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001 > From: Mark H Weaver <mhw <at> netris.org> > Date: Thu, 27 Sep 2018 01:00:11 -0400 > Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'. > > Fixes <https://bugs.gnu.org/32786>. > > 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if > it's available on the host platform. 'atomic_compare_exchange_weak' may > fail spuriously, leaving the atomic object unchanged even when it > contained the expected value. 'atomic-box-compare-and-swap!' uses > 'scm_atomic_compare_and_swap_scm' without checking its return value, and > therefore may return the expected value while leaving the box unchanged. > This contradicts the documentation, which explicitly states that "you > can know if the swap worked by checking if the return value is 'eq?' to > EXPECTED.". > > * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If > 'scm_atomic_compare_and_swap_scm' returns false and the observed value > is equal to 'expected', then try again. > * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto. > --- > libguile/atomic.c | 13 +++++++++---- > libguile/vm-engine.c | 13 ++++++++----- > 2 files changed, 17 insertions(+), 9 deletions(-) > > diff --git a/libguile/atomic.c b/libguile/atomic.c > index 950874030..504643912 100644 > --- a/libguile/atomic.c > +++ b/libguile/atomic.c > @@ -1,4 +1,4 @@ > -/* Copyright (C) 2016 Free Software Foundation, Inc. > +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc. > * > * This library is free software; you can redistribute it and/or > * modify it under the terms of the GNU Lesser General Public License > @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x, > "if the return value is @code{eq?} to @var{expected}.") > #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x > { > + SCM result = expected; > + > SCM_VALIDATE_ATOMIC_BOX (1, box); > - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), > - &expected, desired); > - return expected; > + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), > + &result, desired) > + && scm_is_eq (result, expected)) > + { > + } > + return result; > } > #undef FUNC_NAME > > diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c > index 19ff3e498..9650e33eb 100644 > --- a/libguile/vm-engine.c > +++ b/libguile/vm-engine.c > @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp, > { > scm_t_uint16 dst, box; > scm_t_uint32 expected, desired; > - SCM scm_box, scm_expected; > + SCM scm_box, scm_expected, scm_result; > UNPACK_12_12 (op, dst, box); > UNPACK_24 (ip[1], expected); > UNPACK_24 (ip[2], desired); > scm_box = SP_REF (box); > VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!"); > - scm_expected = SP_REF (expected); > - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), > - &scm_expected, SP_REF (desired)); > - SP_SET (dst, scm_expected); > + scm_result = scm_expected = SP_REF (expected); > + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), > + &scm_result, SP_REF (desired)) > + && scm_is_eq (scm_result, scm_expected)) > + { > + } > + SP_SET (dst, scm_result); > NEXT (3); > } > > -- > 2.19.0 >
Mark H Weaver <mhw <at> netris.org>
:Михаил Бахтерев <mob <at> k.imm.uran.ru>
:Message #28 received at 32786-done <at> debbugs.gnu.org (full text, mbox):
From: Mark H Weaver <mhw <at> netris.org> To: Михаил Бахтерев <mob <at> k.imm.uran.ru> Cc: 32786-done <at> debbugs.gnu.org Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Date: Fri, 05 Oct 2018 19:22:13 -0400
Hi, Михаил Бахтерев <mob <at> k.imm.uran.ru> writes: > I've tested the patch (i've applied it to the v2.2.4 source code). The > test sample works now as expected with both test «payloads»: (sleep > N-seconds) and (system* "any" "command" "here") calls. Thanks for letting us know the results of your testing, this is very helpful. I just pushed a similar fix (same code, improved comments) to the stable-2.2 branch, commit 2d09e0513fc11e2305077ba3653f6e4c2f266ddb, which will be in the upcoming guile-2.2.5 release. Thanks, Mark > > Thanks for fixing it. > > - MB. Respectfully > > On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote: >> Hi, >> >> Thanks for the additional details. I was able to reproduce the bug, and >> I believe I now see the problem. >> >> 'atomic-box-compare-and-swap!' is implemented using >> 'atomic_compare_exchange_weak' (if available), but neglects to handle >> the case where 'atomic_compare_exchange_weak' may spuriously fail. In >> that case, the box is left unchanged, although the observed value was >> equal to the expected value. >> >> What's happening here is that the 'atomic-box-compare-and-swap!' in >> 'sleep-loop' fails spuriously, leaving the box in state #:accepted >> although it returns #:accepted as its result. When the main loop >> discovers this, it changes the state to #:need-to-sleep, although the >> thread has already ended. >> >> To confirm this hypothesis, I added a print statement to the main loop >> showing the state of the box that it observed during the protocol >> exchange, and indeed it sees #:accepted the first time it checks, and >> #:need-to-sleep in all later iterations. >> >> I've attached a proposed patch that I hope will fix this problem. If >> you'd be willing to test it, I'd be grateful, but otherwise I'll try to >> test it in the next week or so. My access to armv7l boxes is somewhat >> limited. >> >> Thanks for this report. >> >> Mark >> >> > >> From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001 >> From: Mark H Weaver <mhw <at> netris.org> >> Date: Thu, 27 Sep 2018 01:00:11 -0400 >> Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'. >> >> Fixes <https://bugs.gnu.org/32786>. >> >> 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if >> it's available on the host platform. 'atomic_compare_exchange_weak' may >> fail spuriously, leaving the atomic object unchanged even when it >> contained the expected value. 'atomic-box-compare-and-swap!' uses >> 'scm_atomic_compare_and_swap_scm' without checking its return value, and >> therefore may return the expected value while leaving the box unchanged. >> This contradicts the documentation, which explicitly states that "you >> can know if the swap worked by checking if the return value is 'eq?' to >> EXPECTED.". >> >> * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If >> 'scm_atomic_compare_and_swap_scm' returns false and the observed value >> is equal to 'expected', then try again. >> * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto. >> --- >> libguile/atomic.c | 13 +++++++++---- >> libguile/vm-engine.c | 13 ++++++++----- >> 2 files changed, 17 insertions(+), 9 deletions(-) >> >> diff --git a/libguile/atomic.c b/libguile/atomic.c >> index 950874030..504643912 100644 >> --- a/libguile/atomic.c >> +++ b/libguile/atomic.c >> @@ -1,4 +1,4 @@ >> -/* Copyright (C) 2016 Free Software Foundation, Inc. >> +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc. >> * >> * This library is free software; you can redistribute it and/or >> * modify it under the terms of the GNU Lesser General Public License >> @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x, >> "if the return value is @code{eq?} to @var{expected}.") >> #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x >> { >> + SCM result = expected; >> + >> SCM_VALIDATE_ATOMIC_BOX (1, box); >> - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), >> - &expected, desired); >> - return expected; >> + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box), >> + &result, desired) >> + && scm_is_eq (result, expected)) >> + { >> + } >> + return result; >> } >> #undef FUNC_NAME >> >> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c >> index 19ff3e498..9650e33eb 100644 >> --- a/libguile/vm-engine.c >> +++ b/libguile/vm-engine.c >> @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp, >> { >> scm_t_uint16 dst, box; >> scm_t_uint32 expected, desired; >> - SCM scm_box, scm_expected; >> + SCM scm_box, scm_expected, scm_result; >> UNPACK_12_12 (op, dst, box); >> UNPACK_24 (ip[1], expected); >> UNPACK_24 (ip[2], desired); >> scm_box = SP_REF (box); >> VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!"); >> - scm_expected = SP_REF (expected); >> - scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), >> - &scm_expected, SP_REF (desired)); >> - SP_SET (dst, scm_expected); >> + scm_result = scm_expected = SP_REF (expected); >> + while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box), >> + &scm_result, SP_REF (desired)) >> + && scm_is_eq (scm_result, scm_expected)) >> + { >> + } >> + SP_SET (dst, scm_result); >> NEXT (3); >> } >> >> -- >> 2.19.0 >>
Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Sat, 03 Nov 2018 11:24:05 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.