GNU bug report logs - #29368
'list-runtime-roots' terminates with non-zero exit code

Previous Next

Package: guix;

Reported by: Martin Castillo <castilma <at> uni-bremen.de>

Date: Mon, 20 Nov 2017 21:05:01 UTC

Severity: normal

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29368 in the body.
You can then email your comments to 29368 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#29368; Package guix. (Mon, 20 Nov 2017 21:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Martin Castillo <castilma <at> uni-bremen.de>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 20 Nov 2017 21:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: bug-guix <at> gnu.org
Subject: Unreliable failing tests / segfaulting guile
Date: Mon, 20 Nov 2017 22:03:52 +0100
[Message part 1 (text/plain, inline)]
The test suite fails on 84bd92438. I was running [env] $ make check -j
3. (I accidentaly overwrote the log file, sorry.)
The kernel logged:
.guile-real[10735]: segfault at 7b ip 000000000057885d sp
00007ffdfb13b570 error 4 in .guile-real[400000+38a000]

The message appeared when the tests were at about
tests/guix-environment.sh (about 30 tests after tests/nar.scm, the only
failing test). But as I had given -j 3, it may be that nar.scm was still
going.

However, rerunning it without -j 3 and than again with -j 3 worked
without any failure.

That's a bit strange, so I repeated make check -j 3 and now
tests/derivations.scm failed. (test-suite-2.log)

And again, while tests/guix-environment.sh or so is running, the kernel
logs.
.guile-real[24288]: segfault at 7b ip 000000000057885d sp
00007ffef8557490 error 4 in .guile-real[400000+38a000]


Martin
[test-suite-2.log (text/x-log, attachment)]
[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#29368; Package guix. (Mon, 20 Nov 2017 22:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 29368 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 29368 <at> debbugs.gnu.org
Subject: Re: bug#29368: Unreliable failing tests / segfaulting guile
Date: Mon, 20 Nov 2017 23:00:10 +0100
Hello,

Martin Castillo <castilma <at> uni-bremen.de> skribis:

> test-name: derivation-prerequisites-to-build when outputs already present
> location: /home/mcd/guix/tests/derivations.scm:790
> source:
> + (test-assert
> +   "derivation-prerequisites-to-build when outputs already present"
> +   (let* ((builder '(begin (mkdir %output) #t))
> +          (input-drv
> +            (build-expression->derivation
> +              %store
> +              "input"
> +              builder))
> +          (input-path
> +            (derivation-output-path
> +              (assoc-ref (derivation-outputs input-drv) "out")))
> +          (drv (build-expression->derivation
> +                 %store
> +                 "something"
> +                 builder
> +                 #:inputs
> +                 `(("i" ,input-drv))))
> +          (output (derivation->output-path drv)))
> +     (when (valid-path? %store input-path)
> +           (delete-paths %store (list input-path)))
> +     (when (valid-path? %store output)
> +           (delete-paths %store (list output)))
> +     (and (equal?
> +            (map derivation-input-path
> +                 (derivation-prerequisites-to-build %store drv))
> +            (list (derivation-file-name input-drv)))
> +          (build-derivations %store (list drv))
> +          (delete-paths %store (list input-path))
> +          (not (valid-path? %store input-path))
> +          (null? (derivation-prerequisites-to-build %store drv)))))
> finding garbage collector roots...
> actual-value: #f
> actual-error:
> + (srfi-34
> +   #<condition &nix-protocol-error [message: "program `/home/mcd/guix/nix/scripts/list-runtime-roots' failed with exit code 1" status: 1] 380d4b0>)
> result: FAIL

It’s seems to be the Guile running ‘list-runtime-roots’ that’s
segfaulting.  Could you try running it manually to see what happens?
(The expected behavior is to write a list of store file names on
standard output.)

TIA,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#29368; Package guix. (Wed, 22 Nov 2017 01:44:01 GMT) Full text and rfc822 format available.

Message #11 received at 29368 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 29368 <at> debbugs.gnu.org
Subject: Re: bug#29368: Unreliable failing tests / segfaulting guile
Date: Wed, 22 Nov 2017 02:43:23 +0100
Hi,
> It’s seems to be the Guile running ‘list-runtime-roots’ that’s
> segfaulting.  Could you try running it manually to see what happens?
> (The expected behavior is to write a list of store file names on
> standard output.)

When running in a loop, I see the expected output. However, after a file
I get the segfault!

;; sorry, I'm new to scheme, what would be a better way? without
;; reloading the file again and again?
scheme@(guile-user)> (while #t (load-from-path "list-runtime-roots"))
... ;; skipping repeated expected output
srfi/srfi-1.scm:592:17: In procedure map1:
srfi/srfi-1.scm:592:17: In procedure fport_read: Kein passender Prozess
gefunden ;; No matching process found

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.

scheme@(guile-user) [1]> ,bt
         102 (primitive-load-path "list-runtime-roots")
In list-runtime-roots:
   145:47101 (_)
In srfi/srfi-1.scm:
   679:15100 (append-map _ _ . _)
   592:29 99 (map1 _)
   592:29 98 (map1 _)
   592:29 97 (map1 _)
   592:29 96 (map1 _)
   592:29 95 (map1 _)
   592:29 94 (map1 _)
   592:29 93 (map1 _)
   592:29 92 (map1 _)
   592:29 91 (map1 _)
   592:29 90 (map1 _)
   592:29 89 (map1 _)
   592:29 88 (map1 _)
   592:29 87 (map1 _)
   592:29 86 (map1 _)
   592:29 85 (map1 _)
   592:29 84 (map1 _)
   592:29 83 (map1 _)
   592:29 82 (map1 _)
   592:29 81 (map1 _)
   592:29 80 (map1 _)
   592:29 79 (map1 _)
   592:29 78 (map1 _)
   592:29 77 (map1 _)
   592:29 76 (map1 _)
   592:29 75 (map1 _)
   592:29 74 (map1 _)
   592:29 73 (map1 _)
   592:29 72 (map1 _)
   592:29 71 (map1 _)
   592:29 70 (map1 _)
   592:29 69 (map1 _)
   592:29 68 (map1 _)
   592:29 67 (map1 _)
   592:29 66 (map1 _)
   592:29 65 (map1 _)
   592:29 64 (map1 _)
   592:29 63 (map1 _)
   592:29 62 (map1 _)
   592:29 61 (map1 _)
   592:29 60 (map1 _)
   592:29 59 (map1 _)
   592:29 58 (map1 _)
   592:29 57 (map1 _)
   592:29 56 (map1 _)
   592:29 55 (map1 _)
   592:29 54 (map1 _)
   592:29 53 (map1 _)
   592:29 52 (map1 _)
   592:29 51 (map1 _)
   592:29 50 (map1 _)
   592:29 49 (map1 _)
   592:29 48 (map1 _)
   592:29 47 (map1 _)
   592:29 46 (map1 _)
   592:29 45 (map1 _)
   592:29 44 (map1 _)
   592:29 43 (map1 _)
   592:29 42 (map1 _)
   592:29 41 (map1 _)
   592:29 40 (map1 _)
   592:29 39 (map1 _)
   592:29 38 (map1 _)
   592:29 37 (map1 _)
   592:29 36 (map1 _)
   592:29 35 (map1 _)
   592:29 34 (map1 _)
   592:29 33 (map1 _)
   592:29 32 (map1 _)
   592:29 31 (map1 _)
   592:29 30 (map1 _)
   592:29 29 (map1 _)
   592:29 28 (map1 _)
   592:29 27 (map1 _)
   592:29 26 (map1 _)
   592:29 25 (map1 _)
   592:29 24 (map1 _)
   592:29 23 (map1 _)
   592:29 22 (map1 _)
   592:29 21 (map1 _)
   592:29 20 (map1 _)
   592:29 19 (map1 _)
   592:29 18 (map1 _)
   592:29 17 (map1 _)
   592:29 16 (map1 _)
   592:29 15 (map1 _)
   592:29 14 (map1 _)
   592:29 13 (map1 _)
   592:29 12 (map1 _)
   592:29 11 (map1 _)
   592:29 10 (map1 _)
   592:29  9 (map1 _)
   592:29  8 (map1 _)
   592:29  7 (map1 _)
   592:29  6 (map1 _)
   592:29  5 (map1 _)
   592:29  4 (map1 _)
   592:29  3 (map1 _)
   592:29  2 (map1 _)
   592:29  1 (map1 _)
   592:17  0 (map1 ("8947"))

I'm running this in qemu with 2 cores and 2.4GB ram.

Martin





Information forwarded to bug-guix <at> gnu.org:
bug#29368; Package guix. (Fri, 24 Nov 2017 17:03:01 GMT) Full text and rfc822 format available.

Message #14 received at 29368 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 29368 <at> debbugs.gnu.org
Subject: Re: bug#29368: Unreliable failing tests / segfaulting guile
Date: Fri, 24 Nov 2017 18:02:14 +0100
Hi,

ludo <at> gnu.org (Ludovic Courtès) skribis:

> Martin Castillo <castilma <at> uni-bremen.de> skribis:
>
>> test-name: derivation-prerequisites-to-build when outputs already present
>> location: /home/mcd/guix/tests/derivations.scm:790
>> source:
>> + (test-assert
>> +   "derivation-prerequisites-to-build when outputs already present"
>> +   (let* ((builder '(begin (mkdir %output) #t))
>> +          (input-drv
>> +            (build-expression->derivation
>> +              %store
>> +              "input"
>> +              builder))
>> +          (input-path
>> +            (derivation-output-path
>> +              (assoc-ref (derivation-outputs input-drv) "out")))
>> +          (drv (build-expression->derivation
>> +                 %store
>> +                 "something"
>> +                 builder
>> +                 #:inputs
>> +                 `(("i" ,input-drv))))
>> +          (output (derivation->output-path drv)))
>> +     (when (valid-path? %store input-path)
>> +           (delete-paths %store (list input-path)))
>> +     (when (valid-path? %store output)
>> +           (delete-paths %store (list output)))
>> +     (and (equal?
>> +            (map derivation-input-path
>> +                 (derivation-prerequisites-to-build %store drv))
>> +            (list (derivation-file-name input-drv)))
>> +          (build-derivations %store (list drv))
>> +          (delete-paths %store (list input-path))
>> +          (not (valid-path? %store input-path))
>> +          (null? (derivation-prerequisites-to-build %store drv)))))
>> finding garbage collector roots...
>> actual-value: #f
>> actual-error:
>> + (srfi-34
>> +   #<condition &nix-protocol-error [message: "program `/home/mcd/guix/nix/scripts/list-runtime-roots' failed with exit code 1" status: 1] 380d4b0>)
>> result: FAIL
>
> It’s seems to be the Guile running ‘list-runtime-roots’ that’s
> segfaulting.

Actually it didn’t segfault: it exited normally, but with exit code 1.
That shouldn’t happen.

Does “make check -j3” or so still trigger the problem?  If so, could you
try collecting more details with:

  strace -f -o log -s 234 make check -j3

?

TIA,
Ludo’.




Changed bug title to ''list-runtime-roots' terminates with non-zero exit code' from 'Unreliable failing tests / segfaulting guile' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 24 Nov 2017 21:03:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#29368; Package guix. (Sun, 26 Nov 2017 13:07:02 GMT) Full text and rfc822 format available.

Message #19 received at 29368 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 29368 <at> debbugs.gnu.org
Subject: Re: bug#29368: Unreliable failing tests / segfaulting guile
Date: Sun, 26 Nov 2017 14:06:42 +0100
Hi,

On 24.11.2017 18:02, Ludovic Courtès wrote:
> Hi,
> 
> ludo <at> gnu.org (Ludovic Courtès) skribis:
> 
>> Martin Castillo <castilma <at> uni-bremen.de> skribis:
>>
>>> test-name: derivation-prerequisites-to-build when outputs already present
>>> location: /home/mcd/guix/tests/derivations.scm:790
>>> source:
>>> + (test-assert
>>> +   "derivation-prerequisites-to-build when outputs already present"
>>> +   (let* ((builder '(begin (mkdir %output) #t))
>>> +          (input-drv
>>> +            (build-expression->derivation
>>> +              %store
>>> +              "input"
>>> +              builder))
>>> +          (input-path
>>> +            (derivation-output-path
>>> +              (assoc-ref (derivation-outputs input-drv) "out")))
>>> +          (drv (build-expression->derivation
>>> +                 %store
>>> +                 "something"
>>> +                 builder
>>> +                 #:inputs
>>> +                 `(("i" ,input-drv))))
>>> +          (output (derivation->output-path drv)))
>>> +     (when (valid-path? %store input-path)
>>> +           (delete-paths %store (list input-path)))
>>> +     (when (valid-path? %store output)
>>> +           (delete-paths %store (list output)))
>>> +     (and (equal?
>>> +            (map derivation-input-path
>>> +                 (derivation-prerequisites-to-build %store drv))
>>> +            (list (derivation-file-name input-drv)))
>>> +          (build-derivations %store (list drv))
>>> +          (delete-paths %store (list input-path))
>>> +          (not (valid-path? %store input-path))
>>> +          (null? (derivation-prerequisites-to-build %store drv)))))
>>> finding garbage collector roots...
>>> actual-value: #f
>>> actual-error:
>>> + (srfi-34
>>> +   #<condition &nix-protocol-error [message: "program `/home/mcd/guix/nix/scripts/list-runtime-roots' failed with exit code 1" status: 1] 380d4b0>)
>>> result: FAIL
>>
>> It’s seems to be the Guile running ‘list-runtime-roots’ that’s
>> segfaulting.
> 
> Actually it didn’t segfault: it exited normally, but with exit code 1.
> That shouldn’t happen.
> 
> Does “make check -j3” or so still trigger the problem?  If so, could you
> try collecting more details with:
> 
>   strace -f -o log -s 234 make check -j3
> 
> ?
> 
> TIA,
> Ludo’.
> 

In the meantime I updated the guix checkout and since my
.config/guix/latest points to the guix checkout, i checked 84bd92438 out
in another directory and ran
guix environment guix --ad-hoc strace
./bootstrap && ./configure --localstatedir=/var
./pre-inst-env strace -f -o log.fifo -s 234 make check -j3

make check -j3 doesn't always trigger it. But it did in the following 2
cases.

Your command creates a lot of output, so I piped straces output to
bzip2. It took very long so I aborted and retried but piped it onto the
host system to compress it. This is still takes very long when it
reached guix-daemon test. On top of it, I got a filesystem error which
resulted in a ro-remount, so I decided to interrupt the second try, too.
You can access the logs at
https://seafile.zfn.uni-bremen.de/d/7990941e630141309a58/


Martin




Reply sent to ludo <at> gnu.org (Ludovic Courtès):
You have taken responsibility. (Sun, 26 Nov 2017 15:00:04 GMT) Full text and rfc822 format available.

Notification sent to Martin Castillo <castilma <at> uni-bremen.de>:
bug acknowledged by developer. (Sun, 26 Nov 2017 15:00:04 GMT) Full text and rfc822 format available.

Message #24 received at 29368-done <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 29368-done <at> debbugs.gnu.org
Subject: Re: bug#29368: Unreliable failing tests / segfaulting guile
Date: Sun, 26 Nov 2017 15:59:42 +0100
[Message part 1 (text/plain, inline)]
Hello,

Martin Castillo <castilma <at> uni-bremen.de> skribis:

> In the meantime I updated the guix checkout and since my
> .config/guix/latest points to the guix checkout, i checked 84bd92438 out
> in another directory and ran
> guix environment guix --ad-hoc strace
> ./bootstrap && ./configure --localstatedir=/var
> ./pre-inst-env strace -f -o log.fifo -s 234 make check -j3
>
> make check -j3 doesn't always trigger it. But it did in the following 2
> cases.
>
> Your command creates a lot of output, so I piped straces output to
> bzip2. It took very long so I aborted and retried but piped it onto the
> host system to compress it. This is still takes very long when it
> reached guix-daemon test. On top of it, I got a filesystem error which
> resulted in a ro-remount, so I decided to interrupt the second try, too.
> You can access the logs at
> https://seafile.zfn.uni-bremen.de/d/7990941e630141309a58/

Thanks a lot for the logs.

With the script below I extracted the lines corresponding to a non-zero
exit of ‘list-runtime-roots’, and then looked at what happened
immediately before.  It typically looks like this:

--8<---------------cut here---------------start------------->8---
15542 write(2, "Backtrace:\nIn srfi/srfi-1.scm:\n   592:29 19 (map1 _)\n   592:29 18 (map1 _)\n   592:29 17 (map1 _)\n   592:29 16 (map1 _)\n   592:29 15 (map1 _)\n   592:29 14 (map1 _)\n   592:29 13 (map1 _)\n   592:29 12 (map1 _)\n   592:29 11 (map1 _)\
n   59"..., 852) = 852
15542 exit_group(1 <unfinished ...>
--8<---------------cut here---------------end--------------->8---

So most likely it corresponds to the error you reported:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (while #t (load-from-path "list-runtime-roots"))
... ;; skipping repeated expected output
srfi/srfi-1.scm:592:17: In procedure map1:
srfi/srfi-1.scm:592:17: In procedure fport_read: Kein passender Prozess
gefunden ;; No matching process found
--8<---------------cut here---------------end--------------->8---

Indeed, if we open, say, /proc/XYZ/cmdline and then process XYZ dies,
reading from that file descriptor yields ESRCH (the error above).

Commit 9b0713012905f3997d6fad201dba7c3d93b38b13 fixes that.  I’ll update
the ‘guix’ package snapshot soon so we get this fix.

Thank you!

Ludo’.

[t.scm (text/plain, inline)]
(use-modules (ice-9 rdelim)
             (ice-9 match)
             (ice-9 regex))

(define exec-rx
  (make-regexp "^([[:digit:]]+) exec.*list-runtime-roots"))

(define exit-rx
  (make-regexp "^([[:digit:]]+) exit_group"))

(call-with-input-file "/tmp/log"
  (lambda (port)
    (let loop ((pid #f))
      (match (read-line port)
        ((? eof-object?)
         #f)
        (line
         (if pid
             (match (regexp-exec exit-rx line)
               (#f
                (loop pid))
               (result
                (if (= pid (string->number (match:substring result 1)))
                    (begin
                      (pk 'exit pid line (port-line port))
                      (loop #f))
                    (loop pid))))
             (match (regexp-exec exec-rx line)
               (#f
                (loop pid))
               (result
                (loop (string->number (match:substring result 1)))))))))))

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 25 Dec 2017 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 117 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.