GNU bug report logs - #17591
GC_REALLOC in 'get-bytevector-n' leads to corrupt bytevector

Previous Next

Package: guile;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Sun, 25 May 2014 21:00:02 UTC

Severity: normal

Merged with 17466

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17591 in the body.
You can then email your comments to 17591 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#17591; Package guix. (Sun, 25 May 2014 21:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to ludo <at> gnu.org (Ludovic Courtès):
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 25 May 2014 21:00:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: bug-guix <at> gnu.org
Subject: Offload hook fails to send files
Date: Sun, 25 May 2014 22:59:11 +0200
We’ve been observing things like this (as of v0.6-257-g2508358):

--8<---------------cut here---------------start------------->8---
load on machine 'hydra.gnunet.org' is 0.52 (normalized: 0.17333333333333334)
@ build-started /gnu/store/wcgr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv - x86_64-linux /var/log/guix/drvs/wc//gr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
Backtrace:
In ice-9/boot-9.scm:
 157: 15 [catch #t #<catch-closure 1e3e2e0> ...]
In unknown file:
   ?: 14 [apply-smob/1 #<catch-closure 1e3e2e0>]
In ice-9/boot-9.scm:
  63: 13 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 12 [eval # #]
In ice-9/boot-9.scm:
2320: 11 [save-module-excursion #<procedure 1e70c80 at ice-9/boot-9.scm:3961:3 ()>]
3966: 10 [#<procedure 1e70c80 at ice-9/boot-9.scm:3961:3 ()>]
1645: 9 [%start-stack load-stack ...]
1650: 8 [#<procedure 1e6f000 ()>]
In unknown file:
   ?: 7 [primitive-load "/usr/local/bin/guix"]
In guix/ui.scm:
 630: 6 [run-guix-command archive "--import"]
In ice-9/boot-9.scm:
 157: 5 [catch srfi-34 #<procedure 2c9f920 at guix/ui.scm:198:2 ()> ...]
 157: 4 [catch system-error ...]
In guix/scripts/archive.scm:
 327: 3 [#<procedure 2ca0cf0 at guix/scripts/archive.scm:312:2 ()>]
In guix/store.scm:
 768: 2 [import-paths #<build-daemon 256.14 2c9b780> #<input: file 0>]
 404: 1 [process-stderr #<build-daemon 256.14 2c9b780> #<input: file 0>]
In unknown file:
   ?: 0 [bytevector-length #<eof>]

ERROR: In procedure bytevector-length:
ERROR: In procedure scm_c_bytevector_length: Wrong type argument in position 1 (expecting bytevector): #<eof>
builder for `/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv' failed to produce output path `/gnu/store/k1hdhfx3b2lkcwkch70fcj998jg7qs7y-binutils-2.24'
--8<---------------cut here---------------end--------------->8---

If we try later the very same thing, it typically succeeds:

--8<---------------cut here---------------start------------->8---
@ build-started /gnu/store/wcgr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv - x86_64-linux /var/log/guix/drvs/wc//gr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
importing path `/gnu/store/5wxd03bpxbw6g4ik9nvq5242p2h1f9zw-binutils-2.24-guile-builder'
importing path `/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv'
offloading '/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv' to 'hydra.gnunet.org'...
@ build-remote /gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv hydra.gnunet.org
--8<---------------cut here---------------end--------------->8---

In the first case, ‘send-files’ appears to be failing early, before the
first store item has been transmitted, because the corresponding
“importing path” message hasn’t been printed yet (it’s printed once the
contents of the item being imported have been read–see local-store.cc.)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#17591; Package guix. (Sun, 25 May 2014 21:29:01 GMT) Full text and rfc822 format available.

Message #8 received at 17591 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: 17591 <at> debbugs.gnu.org
Subject: Re: bug#17591: Offload hook fails to send files
Date: Sun, 25 May 2014 23:28:27 +0200
This also manifests like this:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build hello --no-substitutes
The following derivations will be built:
   /gnu/store/xi7lgm7vyhzhl1z5dy1mba1yki27xkbm-hello-2.9.drv
   /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv
hello from "/home/ludo/soft/etc/guix/machines.scm" (PID 1918)
process 1918 acquired build slot '/var/guix/offload/hydra.gnunet.org/0'
load on machine 'hydra.gnunet.org' is 0.2 (normalized: 0.06666666666666667)
@ build-started /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv - x86_64-linux /var/log/guix/drvs/xa//n82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
offloading '/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv' to 'hydra.gnunet.org'...
@ build-remote /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv hydra.gnunet.org
guix build: error: open-file: No such file or directory: "/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv"
derivation '/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv' offloaded to 'hydra.gnunet.org' failed with exit code 1
--8<---------------cut here---------------end--------------->8---

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#17591; Package guix. (Tue, 27 May 2014 15:18:01 GMT) Full text and rfc822 format available.

Message #11 received at 17591 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: 17591 <at> debbugs.gnu.org
Subject: Re: bug#17591: Offload hook fails to send files
Date: Tue, 27 May 2014 17:16:56 +0200
I modified the offload hook and (guix build utils) so ‘cat’ is used
instead of ‘xz’.

When tracing sshd on the target machine and looking at the ‘guix archive
--import’ process in a failure case we see:

--8<---------------cut here---------------start------------->8---
21927 connect(10, {sa_family=AF_FILE, path="/usr/local/var/guix/daemon-socket/socket"}, 42) = 0
21927 write(10, "cxin\0\0\0\0", 8)      = 8
21927 read(10, "oixd\0\0\0\0", 8)       = 8
21927 read(10, "\16\1\0\0\0\0\0\0", 8)  = 8
21927 write(10, "\f\1\0\0\0\0\0\0", 8)  = 8
21927 write(10, "\1\0\0\0\0\0\0\0", 8)  = 8
21927 read(10, "stla\0\0\0\0", 8)       = 8
21927 write(10, "\33\0\0\0\0\0\0\0", 8) = 8
21927 read(10, "atad\0\0\0\0", 8)       = 8
21927 read(10, "\0\200\0\0\0\0\0\0", 8) = 8
21927 read(0, "\1\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0(\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0regular\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0(begin (use-modules (guix build gnu-build-system) (guix build utils)) (define %output (getenv \"out\")) (define %outputs (map (lambd"..., 4096) = 4096
21927 read(0, "C1B08ED62008395B0764CD9B55E80DA0A2B61C698DC27EA98E68BB576ACFC2B91B4D7283E7D960948D049D6E3C4CB1F489B460A120A4BB6C04A843FD3A67454136DE61CF68A927871EFFA9141BD372A748593C703E0301F039A9E674C50301BFC385BABE5B154250E7D57B82DB31F1E1AC696F870D"..., 4096) = 4096
21927 read(0,  <unfinished ...>

[...]

21927 <... read resumed> "ile-builder\0\0\0\0\0;\0\0\0\0\0\0\0/gnu/store/ip4w122y4wlkw9n1nb6g2sbz4ql0m8dy-patch-2.7.1.drv\0\0\0\0\0009\0\0\0\0\0\0\0/gnu/store/j6gq2p03mxjlmhmamg2mxixdf13djd60-grep-2.18.drv\0\0\0\0\0\0\0008\0\0\0\0\0\0\0/gnu/store/k3azrhdbjrdhjc3vlx154069imcql8jn-bash-4.3.drv;\0\0\0\0\0\0\0/g"..., 4096) = 2880
21926 <... write resumed> )             = 2880
21927 read(0,  <unfinished ...>
21926 read(0, "", 65536)                = 0
21926 close(0)                          = 0
21926 close(1 <unfinished ...>
21927 <... read resumed> "", 4096)      = 0
21926 <... close resumed> )             = 0
21926 close(2)                          = 0
21927 write(10, "@+\0\0\0\0\0\0", 8 <unfinished ...>
21926 exit_group(0)                     = ?
21927 <... write resumed> )             = 8
21927 write(10, "\1\0\0\0\0\0\0\0\0000\213\1\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0 0\213\1\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0 <at> 0\213\1\0\0\0\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0`0\213\1\0\0\0\0use-modules (guix build \2000\213\1\0\0\0\0d-system) (guix build ut\2400\213\1\0\0\0\0efine %output (getenv \"o\3000\213\1\0\0\0\0efine %outputs (map (lam\3400"..., 11072 <unfinished ...>
21925 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21926
--8<---------------cut here---------------end--------------->8---

It reads 11072 bytes (as expected) from stdin, but by the time it writes
them on the socket (the last write(10, ...) call), those 11072 bytes got
corrupted.

The above snippet corresponds to this part of store.scm:

--8<---------------cut here---------------start------------->8---
          ((= k %stderr-read)
           ;; Read a byte stream from USER-PORT.
           (let* ((max-len (read-int p))
                  (data    (get-bytevector-n user-port max-len))
                  (len     (bytevector-length data)))
             (write-int len p)
             (put-bytevector p data)
             (write-padding len p)
             #f))
--8<---------------cut here---------------end--------------->8---

where ‘data’ is the corrupted bytevector.

The good news is that I can reproduce it like this:

--8<---------------cut here---------------start------------->8---
$ guix archive --export $(guix build -d coreutils) > t.nar
$ while guix archive --import <t.nar ; do : ; done
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
[... wait ...]
Backtrace:
In ice-9/boot-9.scm:
 157: 15 [catch #t #<catch-closure 2062ce0> ...]
In unknown file:
   ?: 14 [apply-smob/1 #<catch-closure 2062ce0>]
In ice-9/boot-9.scm:
  63: 13 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 12 [eval # #]
In ice-9/boot-9.scm:
2401: 11 [save-module-excursion #<procedure 207f940 at ice-9/boot-9.scm:4045:3 ()>]
4050: 10 [#<procedure 207f940 at ice-9/boot-9.scm:4045:3 ()>]
1724: 9 [%start-stack load-stack ...]
1729: 8 [#<procedure 2096ea0 ()>]
In unknown file:
   ?: 7 [primitive-load "/home/ludo/soft/bin/guix"]
In guix/ui.scm:
 631: 6 [run-guix-command archive "--import"]
In ice-9/boot-9.scm:
 157: 5 [catch srfi-34 #<procedure 3018b00 at guix/ui.scm:198:2 ()> ...]
 157: 4 [catch system-error ...]
In guix/scripts/archive.scm:
 327: 3 [#<procedure 3019a80 at guix/scripts/archive.scm:312:2 ()>]
In guix/store.scm:
 766: 2 [import-paths #<build-daemon 256.14 2ff7000> #<input: file 0>]
 400: 1 [process-stderr #<build-daemon 256.14 2ff7000> #<input: file 0>]
In unknown file:
   ?: 0 [bytevector-length #<eof>]

ERROR: In procedure bytevector-length:
ERROR: In procedure scm_c_bytevector_length: Wrong type argument in position 1 (expecting bytevector): #<eof>
--8<---------------cut here---------------end--------------->8---

To be continued...

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#17591; Package guix. (Tue, 27 May 2014 20:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 17591 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: 17591 <at> debbugs.gnu.org
Subject: Re: bug#17591: Offload hook fails to send files
Date: Tue, 27 May 2014 22:56:17 +0200
ludo <at> gnu.org (Ludovic Courtès) skribis:

> 21927 write(10, "\1\0\0\0\0\0\0\0\0000\213\1\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0 0\213\1\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0 <at> 0\213\1\0\0\0\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0`0\213\1\0\0\0\0use-modules (guix build \2000\213\1\0\0\0\0d-system) (guix build ut\2400\213\1\0\0\0\0efine %output (getenv \"o\3000\213\1\0\0\0\0efine %outputs (map (lam\3400"..., 11072 <unfinished ...>
> 21925 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21926
>
> It reads 11072 bytes (as expected) from stdin, but by the time it writes
> them on the socket (the last write(10, ...) call), those 11072 bytes got
> corrupted.
>
> The above snippet corresponds to this part of store.scm:
>
>           ((= k %stderr-read)
>            ;; Read a byte stream from USER-PORT.
>            (let* ((max-len (read-int p))
>                   (data    (get-bytevector-n user-port max-len))
>                   (len     (bytevector-length data)))
>              (write-int len p)
>              (put-bytevector p data)
>              (write-padding len p)
>              #f))
>
> where ‘data’ is the corrupted bytevector.
>
> The good news is that I can reproduce it like this:
>
> $ guix archive --export $(guix build -d coreutils) > t.nar
> $ while guix archive --import <t.nar ; do : ; done

The problem comes from the ‘get-bytevector-n’ call above, and
specifically the ‘scm_c_shrink_bytevector’ and ‘GC_REALLOC’ calls it
entails.  If we change that to, say, ‘scm_make_bytevector’ (in
libguile), then the problem goes away.

So that looks like a GC_REALLOC bug (in libgc 7.4.0 and 7.2d at least),
but unfortunately I’ve been unable to reduce it further (neither in
Scheme nor in C.)

On the Guix side, commit 5895f24 works around the problem.  Now we need
a workaround in libguile, and ideally a proper bug report for libgc.

(I’ll close this bug when we confirm that it solves the problem on
Hydra.)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#17591; Package guix. (Wed, 28 May 2014 15:14:02 GMT) Full text and rfc822 format available.

Message #17 received at 17591 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: 17591 <at> debbugs.gnu.org
Subject: Re: bug#17591: Offload hook fails to send files
Date: Wed, 28 May 2014 17:13:36 +0200
[Message part 1 (text/plain, inline)]
ludo <at> gnu.org (Ludovic Courtès) skribis:

> $ guix archive --export $(guix build -d coreutils) > t.nar
> $ while guix archive --import <t.nar ; do : ; done

Slightly reduced test case: after creating t.nar as above, run the code
below in a loop until it stops (with a pre-5895f24 Guix!):

--8<---------------cut here---------------start------------->8---
(use-modules (guix)
             (gnu packages base)
             (rnrs io ports))

(with-store store
  (let* (;; (drv   (package-derivation store coreutils))
         ;; (dump  (call-with-bytevector-output-port
         ;;         (lambda (port)
         ;;           (export-paths store (list (derivation-file-name drv))
         ;;                         port))))
         (dump  (call-with-input-file "t.nar" get-bytevector-all)))
    (unless (pair?
             (pk 'import (import-paths store (open-bytevector-input-port dump))))
      (format #t
              "attach GDB to ~a, call GC_default_print_heap_obj_proc,etc.~%"
              (getpid))
      (sleep 1000)
      (exit 1))))
--8<---------------cut here---------------end--------------->8---

Also useful is to modify libguile like this:

[Message part 2 (text/x-patch, inline)]
diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index be8b654..9a8902f 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -315,10 +315,14 @@ scm_c_shrink_bytevector (SCM bv, size_t c_new_len)
   SCM_BYTEVECTOR_SET_LENGTH (bv, c_new_len);
 
   if (SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
+    {
+      fprintf (stderr, "%s %zi -> %zi\n", __func__, c_len, c_new_len);
       new_bv = PTR2SCM (scm_gc_realloc (SCM2PTR (bv),
 					c_len + SCM_BYTEVECTOR_HEADER_BYTES,
 					c_new_len + SCM_BYTEVECTOR_HEADER_BYTES,
 					SCM_GC_BYTEVECTOR));
+      printf ("new_bv = %p, bv = %p\n", new_bv, bv);
+    }
   else
     {
       signed char *c_bv;
[Message part 3 (text/plain, inline)]
Here I see uninteresting things like:

--8<---------------cut here---------------start------------->8---
scm_c_shrink_bytevector 32768 -> 6960
new_bv = 0x1e96000, bv = 0x1e9a000
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'

;;; (import ("/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv"))
scm_c_shrink_bytevector 32768 -> 6960
new_bv = 0x1efb000, bv = 0x1eff000

;;; (import ())
attach GDB to 10124
--8<---------------cut here---------------end--------------->8---

Then I can call GC_default_print_heap_obj_proc(0x1efb000) from GDB,
which displays:

--8<---------------cut here---------------start------------->8---
object at 0x1efb000 of appr. 6992 bytes (atomic)
--8<---------------cut here---------------end--------------->8---

IOW, everything looks alright, except that the contents of new_bv really
are corrupt.

Ludo’.

bug reassigned from package 'guix' to 'guile'. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Wed, 28 May 2014 15:15:02 GMT) Full text and rfc822 format available.

Changed bug title to 'GC_REALLOC in 'get-bytevector-n' leads to corrupt bytevector' from 'Offload hook fails to send files' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Wed, 28 May 2014 15:18:01 GMT) Full text and rfc822 format available.

Merged 17466 17591. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Wed, 28 May 2014 15:25:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 26 Jun 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 300 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.