GNU bug report logs - #41702
`guix environment` performance issues

Previous Next

Package: guix;

Reported by: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>

Date: Thu, 4 Jun 2020 08:24:01 UTC

Severity: important

Merged with 48005

To reply to this bug, email your comments to 41702 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Thu, 04 Jun 2020 08:24:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lars-Dominik Braun <ldb <at> leibniz-psychology.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 04 Jun 2020 08:24:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: bug-guix <at> gnu.org
Subject: `guix environment` performance issues
Date: Thu, 4 Jun 2020 10:23:16 +0200
[Message part 1 (text/plain, inline)]
Hi,

I’ve noticed that `guix environment` can be very very slow for some packages.
Whereas usually a call like

	time guix environment --ad-hoc <package> -- true

takes 600ms to 1.5s, it takes 8.4s for the package r-learnr on my Ryzen 5 3600
with NVMe SSD, 32G of RAM and a warm cache. This seems to accumulate when
trying to install multiple unrelated “slow” packages into the same environment,
i.e. jupyter (2.9s) + r-rlearnr (8.4s) + r-stm (2.9s) = 12.1s. What these
packages seem to have in common is a deep chain of propagated inputs.

And in fact, if I create artificial test packages where test-999 propagates
test-998, test-998→test-997 and so on the command above takes 1m 9s. A single
package propagating 999 packages takes 4.5s.

Instrumenting guix-environment’s with-store call with statprof I see a lot of
time is spent in the garbage collector:

Total time: 24.672604202 seconds (19.431122691 seconds in GC)
./pre-inst-env guix environment --ad-hoc r-learnr -- true  25,18s user 0,24s system 308% cpu 8,248 total

More specifically in an anonymous function and reap-pipes, which is a gc hook,
I believe:

%     cumulative   self
time   seconds    seconds   calls   procedure
 33.41     14.49      8.24                            anon #xbb8480
 27.95      6.90      6.90                            ice-9/popen.scm:145:0:reap-pipes
  4.37      1.08      1.08                            anon #xbbdcd8
  3.28      0.86      0.81                            ice-9/vlist.scm:539:0:vhash-assq
  2.40      2.37      0.59                            guix/grafts.scm:202:22

Not sure if that helps tracking the issue down though.

Cheers,
Lars

PS: Here’s `guix describe`:
Generation 1	Jun 03 2020 10:38:17	(current)
  guix 9ab2790
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 9ab2790e2acf872c0f2cf41ec180de80d1eee3f3
  zpid bab33c1
    repository URL: https://github.com/leibniz-psychology/guix-zpid.git
    branch: master
    commit: bab33c19e0f0304d7aa82751653eb85d8cad8c0a

[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 06 Jun 2020 14:35:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Sat, 06 Jun 2020 17:52:02 GMT) Full text and rfc822 format available.

Message #10 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Sat, 06 Jun 2020 18:08:31 +0200
[Message part 1 (text/plain, inline)]
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

> Total time: 24.672604202 seconds (19.431122691 seconds in GC)
> ./pre-inst-env guix environment --ad-hoc r-learnr -- true  25,18s user 0,24s system 308% cpu 8,248 total
>
> More specifically in an anonymous function and reap-pipes, which is a gc hook,
> I believe:
>
> %     cumulative   self
> time   seconds    seconds   calls   procedure
>  33.41     14.49      8.24                            anon #xbb8480
>  27.95      6.90      6.90                            ice-9/popen.scm:145:0:reap-pipes
>   4.37      1.08      1.08                            anon #xbbdcd8
>   3.28      0.86      0.81                            ice-9/vlist.scm:539:0:vhash-assq
>   2.40      2.37      0.59                            guix/grafts.scm:202:22

Guile master has a fix for statprof that yields more useful info:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(guix scripts environment)
scheme@(guile-user)> ,pr (guix-environment "--ad-hoc" "r-learnr" "--" "true")
%     cumulative   self             
time   seconds     seconds  procedure
 29.84      9.87      6.16  append
 19.56      4.04      4.04  %after-gc-thunk
  6.85      1.87      1.42  ice-9/vlist.scm:539:0:vhash-assq
  5.44      1.17      1.12  write
  3.23      0.67      0.67  guix/derivations.scm:665:0:derivation->output-paths
  2.82      0.58      0.58  string=?
  2.42      2.37      0.50  guix/grafts.scm:202:22
  2.42      0.50      0.50  list?
  2.22      0.46      0.46  hashq
  2.02      0.42      0.42  display
  1.61     15.82      0.33  guix/grafts.scm:186:0:reference-origin
  1.61      0.87      0.33  guix/grafts.scm:204:31
  1.21      0.33      0.25  guix/derivations.scm:667:7
  1.21      0.29      0.25  srfi/srfi-1.scm:817:0:any
  1.01   1232.14      0.21  srfi/srfi-1.scm:584:5:map1
  0.81      0.83      0.17  guix/derivations.scm:697:0:derivation/masked-inputs
  0.81      0.75      0.17  srfi/srfi-1.scm:580:2:map
  0.81      0.17      0.17  guix/derivations.scm:158:0:%derivation-input-derivation-procedure
  0.60      0.17      0.12  reverse
  0.60      0.12      0.12  hashq-ref
  0.60      0.12      0.12  get-bytevector-n
  0.60      0.12      0.12  procedure?
  0.40      0.67      0.08  guix/packages.scm:1232:0:fold-bag-dependencies
  0.40      0.12      0.08  string->utf8
  0.40      0.12      0.08  ice-9/vlist.scm:534:0:vhash-assoc
  0.40      0.12      0.08  ice-9/vlist.scm:449:0:vhash-cons
  0.40      0.12      0.08  delete-duplicates
  0.40      0.08      0.08  ice-9/boot-9.scm:1389:0:->bool
  0.40      0.08      0.08  ice-9/boot-9.scm:2201:0:%load-announce
  0.40      0.08      0.08  hash
  0.40      0.08      0.08  guix/derivations.scm:665:0:derivation->output-paths
  0.20     20.73      0.04  guix/gexp.scm:1061:2
--8<---------------cut here---------------end--------------->8---

Notice that the same command with ‘--no-grafts’ takes 2s instead of 11s.

The patch below arranges so that ‘cumulative-grafts’ processes
dependencies in a batch, such that the derivation’s dependency graph is
traversed once for all, which makes a difference for derivations with
lots of inputs.

Here’s the before/after comparison:

--8<---------------cut here---------------start------------->8---
$ time guix environment --ad-hoc r-learnr --search-paths
export PATH="/gnu/store/n4wxbmqpafjfyawrla8xymzzdm5hxwph-profile/bin${PATH:+:}$PATH"

real	0m11.328s
user	0m20.155s
sys	0m0.172s
$ time ./pre-inst-env guix environment --ad-hoc r-learnr --search-paths
export PATH="/gnu/store/if6z77la3mx0qdzvcyl4qv9i5cyp48i0-profile/bin${PATH:+:}$PATH"

real	0m4.602s
user	0m6.189s
sys	0m0.136s
--8<---------------cut here---------------end--------------->8---

There’s still room for improvement, but it’s much better.

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/grafts.scm b/guix/grafts.scm
index 69d6fe4469..910dcadc8a 100644
--- a/guix/grafts.scm
+++ b/guix/grafts.scm
@@ -20,10 +20,12 @@
   #:use-module (guix store)
   #:use-module (guix monads)
   #:use-module (guix records)
+  #:use-module (guix combinators)
   #:use-module (guix derivations)
   #:use-module ((guix utils) #:select (%current-system))
   #:use-module (guix sets)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-9 gnu)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-34)
@@ -183,32 +185,47 @@ references."
            (set-current-state (vhash-cons key result cache))
            (return result)))))))
 
-(define (reference-origin drv item)
-  "Return the derivation/output pair among the inputs of DRV, recursively,
-that produces ITEM.  Return #f if ITEM is not produced by a derivation (i.e.,
-it's a content-addressed \"source\"), or if it's not produced by a dependency
-of DRV."
+(define (reference-origins drv items)
+  "Return the derivation/output pairs among the inputs of DRV, recursively,
+that produce ITEMS.  Elements of ITEMS not produced by a derivation (i.e.,
+it's a content-addressed \"source\"), or not produced by a dependency of DRV,
+have no corresponding element in the resulting list."
+  (define (lookup-derivers drv result items)
+    ;; Return RESULT augmented by all the drv/output pairs producing one of
+    ;; ITEMS, and ITEMS stripped of matching items.
+    (fold2 (match-lambda*
+             (((output . file) result items)
+              (if (member file items)
+                  (values (alist-cons drv output result)
+                          (delete file items))
+                  (values result items))))
+           result items
+           (derivation->output-paths drv)))
+
   ;; Perform a breadth-first traversal of the dependency graph of DRV in
-  ;; search of the derivation that produces ITEM.
+  ;; search of the derivations that produce ITEMS.
   (let loop ((drv (list drv))
+             (items items)
+             (result '())
              (visited (setq)))
     (match drv
       (()
-       #f)
+       result)
       ((drv . rest)
-       (if (set-contains? visited drv)
-           (loop rest visited)
-           (let ((inputs (derivation-inputs drv)))
-             (or (any (lambda (input)
-                        (let ((drv (derivation-input-derivation input)))
-                          (any (match-lambda
-                                 ((output . file)
-                                  (and (string=? file item)
-                                       (cons drv output))))
-                               (derivation->output-paths drv))))
-                      inputs)
-                 (loop (append rest (map derivation-input-derivation inputs))
-                       (set-insert drv visited)))))))))
+       (cond ((null? items)
+              result)
+             ((set-contains? visited drv)
+              (loop rest items result visited))
+             (else
+              (let*-values (((inputs)
+                             (map derivation-input-derivation
+                                  (derivation-inputs drv)))
+                            ((result items)
+                             (fold2 lookup-derivers
+                                    result items inputs)))
+                (loop (append rest inputs)
+                      items result
+                      (set-insert drv visited)))))))))
 
 (define* (cumulative-grafts store drv grafts
                             #:key
@@ -233,25 +250,27 @@ derivations to the corresponding set of grafts."
       (_
        #f)))
 
-  (define (dependency-grafts item)
-    (match (reference-origin drv item)
-      ((drv . output)
-       ;; If GRAFTS already contains a graft from DRV, do not override it.
-       (if (find (cut graft-origin? drv <>) grafts)
-           (state-return grafts)
-           (cumulative-grafts store drv grafts
-                              #:outputs (list output)
-                              #:guile guile
-                              #:system system)))
-      (#f
-       (state-return grafts))))
+  (define (dependency-grafts items)
+    (mapm %store-monad
+          (lambda (drv+output)
+            (match drv+output
+              ((drv . output)
+               ;; If GRAFTS already contains a graft from DRV, do not
+               ;; override it.
+               (if (find (cut graft-origin? drv <>) grafts)
+                   (state-return grafts)
+                   (cumulative-grafts store drv grafts
+                                      #:outputs (list output)
+                                      #:guile guile
+                                      #:system system)))))
+          (reference-origins drv items)))
 
   (with-cache (cons (derivation-file-name drv) outputs)
     (match (non-self-references store drv outputs)
       (()                                         ;no dependencies
        (return grafts))
       (deps                                       ;one or more dependencies
-       (mlet %state-monad ((grafts (mapm %state-monad dependency-grafts deps)))
+       (mlet %state-monad ((grafts (dependency-grafts deps)))
          (let ((grafts (delete-duplicates (concatenate grafts) equal?)))
            (match (filter (lambda (graft)
                             (member (graft-origin-file-name graft) deps))

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Sat, 06 Jun 2020 21:41:02 GMT) Full text and rfc822 format available.

Message #13 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Sat, 06 Jun 2020 23:40:51 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> The patch below arranges so that ‘cumulative-grafts’ processes
> dependencies in a batch, such that the derivation’s dependency graph is
> traversed once for all, which makes a difference for derivations with
> lots of inputs.

Pushed:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=58bb833365db4e8934a386497d5b00a063cfd27d

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 08 Jun 2020 09:06:02 GMT) Full text and rfc822 format available.

Message #16 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 8 Jun 2020 11:04:53 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

> --8<---------------cut here---------------start------------->8---
> $ time guix environment --ad-hoc r-learnr --search-paths
> export PATH="/gnu/store/n4wxbmqpafjfyawrla8xymzzdm5hxwph-profile/bin${PATH:+:}$PATH"
> 
> real	0m11.328s
> user	0m20.155s
> sys	0m0.172s
> $ time ./pre-inst-env guix environment --ad-hoc r-learnr --search-paths
> export PATH="/gnu/store/if6z77la3mx0qdzvcyl4qv9i5cyp48i0-profile/bin${PATH:+:}$PATH"
> 
> real	0m4.602s
> user	0m6.189s
> sys	0m0.136s
> --8<---------------cut here---------------end--------------->8---
that’s awesome and brings me much closer to my goal of running all applications
inside a `guix environment` container for reproducibility. Including the
protocol fixes from #41720 I’m now down to ~30s from ~50s, which may be called
somewhat usable. Obviously I’d be very interested in further speedups.

Thanks,
Lars

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 08 Jun 2020 22:00:02 GMT) Full text and rfc822 format available.

Message #19 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 08 Jun 2020 23:59:37 +0200
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

>> --8<---------------cut here---------------start------------->8---
>> $ time guix environment --ad-hoc r-learnr --search-paths
>> export PATH="/gnu/store/n4wxbmqpafjfyawrla8xymzzdm5hxwph-profile/bin${PATH:+:}$PATH"
>> 
>> real	0m11.328s
>> user	0m20.155s
>> sys	0m0.172s
>> $ time ./pre-inst-env guix environment --ad-hoc r-learnr --search-paths
>> export PATH="/gnu/store/if6z77la3mx0qdzvcyl4qv9i5cyp48i0-profile/bin${PATH:+:}$PATH"
>> 
>> real	0m4.602s
>> user	0m6.189s
>> sys	0m0.136s
>> --8<---------------cut here---------------end--------------->8---
> that’s awesome and brings me much closer to my goal of running all applications
> inside a `guix environment` container for reproducibility. Including the
> protocol fixes from #41720 I’m now down to ~30s from ~50s, which may be called
> somewhat usable. Obviously I’d be very interested in further speedups.

That’s over SSH, right?

Probably what’s killing us is the round-trip time for all these small
RPCs.  We would need pipelining but the RPC protocol is not designed to
make that easy.

Perhaps you could “strace -Tt” the thing to check whether this
hypothesis is correct by looking at the time we spend waiting for
replies?

As for the CPU cost (i.e., going below the 4.6s above), we should keep
profiling just like you did.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Tue, 09 Jun 2020 09:16:01 GMT) Full text and rfc822 format available.

Message #22 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Tue, 9 Jun 2020 11:15:22 +0200
[Message part 1 (text/plain, inline)]
Hey,

> That’s over SSH, right?
correct, the worst possible case: Inside two VM’s on a Laptop, SSH transport
between them and /gnu+/var/guix on an NFS share (nfsd is in the same VM as
guix-daemon).

> Probably what’s killing us is the round-trip time for all these small
> RPCs.  We would need pipelining but the RPC protocol is not designed to
> make that easy.
That would have been my best guess too, but it does not seem to be the biggest
problem right now. Looking at the numbers again (both patches applied) with the
attached manifest[1], I see that:

---snip---
Local UNIX socket with and without --no-grafts:
    N           Min           Max        Median           Avg        Stddev
x  10          6.07          6.35         6.145          6.16    0.08232726
+  10         17.47         17.89        17.545        17.602    0.14351152
Difference at 99.0% confidence
        11.442 +/- 0.150576
        185.747% +/- 4.07133%

Local UNIX socket vs. guix://localhost transport:
    N           Min           Max        Median           Avg        Stddev
x  10         17.47         17.89        17.545        17.602    0.14351152
+  10         17.43          18.1         17.61        17.642    0.20131788
No difference proven at 99.0% confidence

Local UNIX socket vs ssh://localhost transport:
    N           Min           Max        Median           Avg        Stddev
x  10         17.47         17.89        17.545        17.602    0.14351152
+  10         33.46         35.27        34.315        34.359    0.53873205
Difference at 99.0% confidence
        16.757 +/- 0.5074
        95.1994% +/- 3.13957%
---snap---

So I would conclude:

1) Grafting still takes a lot of time and needs more work
2) Linux optimizes localhost networking pretty well
3) Our SSH transport is terribly slow

Moving to non-localhost communication between two VM’s:

---snip---
guix://localhost vs. guix://remote-host transport:
    N           Min           Max        Median           Avg        Stddev
x  10         17.43          18.1         17.61        17.642    0.20131788
+  10         20.88         22.58        21.095        21.222    0.49689704
Difference at 99.0% confidence
        3.58 +/- 0.487934
        20.2925% +/- 2.85159%

guix://remote-host vs. ssh://remote-host:
    N           Min           Max        Median           Avg        Stddev
x  10         20.88         22.58        21.095        21.222    0.49689704
+  10          30.1         32.56        31.005        31.093    0.70740606
Difference at 99.0% confidence
        9.871 +/- 0.786769
        46.5131% +/- 4.35326%
---snap---

Conclusion here is the same: Not alot of impact of networking/NFS and SSH
transport is still terribly slow. (Confusingly faster than localhost though.)

> Perhaps you could “strace -Tt” the thing to check whether this
> hypothesis is correct by looking at the time we spend waiting for
> replies?
I’m not sure this will yield meaningful data for SSH, so I analyzet it for
guix://localhost vs. guix://remote-host. Takeaway is, yes, of course there is a
statistically significant difference and it’s about 40%±50%, which means this
method is pretty useless, because we can’t bin RPC’s by type.

So, I guess it would make sense for me to look at the SSH transport itself
again and see if there are any other low-hanging fruit. Not sure how much I can
help with profiling guile/guix itself. A different/better RPC protocol is
probably GSoC/v2.0-worthy?

Sorry for all the lengthy emails,
Lars

[1] You’ll need this channel: https://github.com/leibniz-psychology/guix-zpid

[manifest.scm (text/plain, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Sat, 27 Jun 2020 21:21:02 GMT) Full text and rfc822 format available.

Message #25 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Sat, 27 Jun 2020 23:20:27 +0200
[Message part 1 (text/plain, inline)]
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

> That would have been my best guess too, but it does not seem to be the biggest
> problem right now. Looking at the numbers again (both patches applied) with the
> attached manifest[1], I see that:
>
> ---snip---
> Local UNIX socket with and without --no-grafts:
>     N           Min           Max        Median           Avg        Stddev
> x  10          6.07          6.35         6.145          6.16    0.08232726
> +  10         17.47         17.89        17.545        17.602    0.14351152
> Difference at 99.0% confidence
>         11.442 +/- 0.150576
>         185.747% +/- 4.07133%
>
> Local UNIX socket vs. guix://localhost transport:
>     N           Min           Max        Median           Avg        Stddev
> x  10         17.47         17.89        17.545        17.602    0.14351152
> +  10         17.43          18.1         17.61        17.642    0.20131788
> No difference proven at 99.0% confidence
>
> Local UNIX socket vs ssh://localhost transport:
>     N           Min           Max        Median           Avg        Stddev
> x  10         17.47         17.89        17.545        17.602    0.14351152
> +  10         33.46         35.27        34.315        34.359    0.53873205
> Difference at 99.0% confidence
>         16.757 +/- 0.5074
>         95.1994% +/- 3.13957%
> ---snap---
>
> So I would conclude:
>
> 1) Grafting still takes a lot of time and needs more work

Yes, I noticed that there’s still redundant work being done, so we may
need global caching in addition to what
58bb833365db4e8934a386497d5b00a063cfd27d did.

> 2) Linux optimizes localhost networking pretty well
> 3) Our SSH transport is terribly slow

Yes.

The patch below is a noticeable improvement for me.  On my laptop,

  GUIX_DAEMON_SOCKET=ssh://localhost ./pre-inst-env guix build libreoffice -n 

goes from 5.8s to 3.3s.  It just does the same thing as we do for
guix://.

Could you check what results it gives you?

> So, I guess it would make sense for me to look at the SSH transport itself
> again and see if there are any other low-hanging fruit. Not sure how much I can
> help with profiling guile/guix itself. A different/better RPC protocol is
> probably GSoC/v2.0-worthy?

There’s a project to rewrite the daemon in Scheme, started by Caleb
Ristvedt (reepa) as part of GSoC a few years ago.  This could be an
opportunity to add a new version of the protocol that would support
pipelining.

Thanks,
Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/ssh.scm b/guix/ssh.scm
index 2d7ca7d01d..a9312b7c8c 100644
--- a/guix/ssh.scm
+++ b/guix/ssh.scm
@@ -98,7 +98,7 @@ actual key does not match."
                                 key type))))))))
 
 (define* (open-ssh-session host #:key user port identity
-                           host-key
+                           host-key open-connection
                            (compression %compression)
                            (timeout 3600))
   "Open an SSH session for HOST and return it.  IDENTITY specifies the file
@@ -110,6 +110,10 @@ When HOST-KEY is true, it must be a string like \"ssh-ed25519 AAAAC3Nz…
 root <at> example.org\"; the server is authenticated and an error is raised if its
 host key is different from HOST-KEY.
 
+When OPEN-CONNECTION is true, it must be a two-argument procedure; it is
+passed HOST and PORT and must return a socket (a file port).  When
+OPEN-CONNECTION is false, Guile-SSH takes care of opening the connection.
+
 Install TIMEOUT as the maximum time in seconds after which a read or write
 operation on a channel of the returned session is considered as failing.
 
@@ -134,6 +138,13 @@ Throw an error on failure."
     ;; Honor ~/.ssh/config.
     (session-parse-config! session)
 
+    (when open-connection
+      (let* ((sock (open-connection host port)))
+        ;; Since 'session-set!' doesn't increase the revealed count of SOCK
+        ;; (as of Guile-SSH 0.12.0), do it ourselves.
+        (set-port-revealed! sock 1)
+        (session-set! session 'fd sock)))
+
     (match (connect! session)
       ('ok
        (if host-key
diff --git a/guix/store/ssh.scm b/guix/store/ssh.scm
index 09c0832505..d07f0d7467 100644
--- a/guix/store/ssh.scm
+++ b/guix/store/ssh.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2017 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2017, 2020 Ludovic Courtès <ludo <at> gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -19,6 +19,7 @@
 (define-module (guix store ssh)
   #:use-module (guix ssh)
   #:use-module (web uri)
+  #:use-module (ssh session)
   #:export (connect-to-daemon))
 
 ;;; Commentary:
@@ -29,11 +30,27 @@
 ;;;
 ;;; End:
 
+(define (open-connection host port)
+  "Open a connection to HOST and PORT.  Use the standard SSH port if PORT is
+false."
+  (let* ((lst  (getaddrinfo host
+                            (if port (number->string port) "ssh")
+                            (if port AI_NUMERICSERV 0)))
+         (addr (addrinfo:addr (car lst)))
+         (sock (socket (sockaddr:fam addr) SOCK_STREAM 0)))
+    ;; Setting this option makes a dramatic difference because it avoids the
+    ;; "ACK delay" on our RPC messages.
+    (setsockopt sock IPPROTO_TCP TCP_NODELAY 1)
+
+    (connect sock addr)
+    sock))
+
 (define (connect-to-daemon uri)
   "Connect to the SSH daemon at URI, a URI object with the 'ssh' scheme."
   (remote-daemon-channel
    (open-ssh-session (uri-host uri)
                      #:port (or (uri-port uri) 22)
-                     #:user (uri-userinfo uri))))
+                     #:user (uri-userinfo uri)
+                     #:open-connection open-connection)))
 
 ;;; ssh.scm ends here

Added tag(s) patch. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 29 Jun 2020 20:57:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Tue, 30 Jun 2020 10:00:02 GMT) Full text and rfc822 format available.

Message #30 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Tue, 30 Jun 2020 11:59:36 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

> The patch below is a noticeable improvement for me.  On my laptop,
> 
>   GUIX_DAEMON_SOCKET=ssh://localhost ./pre-inst-env guix build libreoffice -n 
> 
> goes from 5.8s to 3.3s.  It just does the same thing as we do for
> guix://.
> 
> Could you check what results it gives you?
looking pretty good, comparing master without/with patch:

---snip---
    N           Min           Max        Median           Avg        Stddev
x  10         16.87         20.77         17.44        17.802     1.1228515
+  10          9.92         13.67         10.19        10.569     1.1254574
Difference at 99.0% confidence
        -7.233 +/- 1.44688
        -40.6303% +/- 6.68735%
---snap---

> There’s a project to rewrite the daemon in Scheme, started by Caleb
> Ristvedt (reepa) as part of GSoC a few years ago.  This could be an
> opportunity to add a new version of the protocol that would support
> pipelining.
Given how much improvement you’ve achieved so far I’d say we can (for now)
probably get away with the current protocol. Still, what’s the status of this
daemon?

Lars

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Wed, 01 Jul 2020 10:51:01 GMT) Full text and rfc822 format available.

Message #33 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Wed, 01 Jul 2020 12:49:51 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> +    (when open-connection
> +      (let* ((sock (open-connection host port)))
> +        ;; Since 'session-set!' doesn't increase the revealed count of SOCK
> +        ;; (as of Guile-SSH 0.12.0), do it ourselves.
> +        (set-port-revealed! sock 1)
> +        (session-set! session 'fd sock)))

The problem with this approach is that ‘open-connection’ is past the raw
host name; aliases, proxy commands, etc. from ~/.ssh/config are ignored.

So it cannot go in as is.  :-/

What we need is Guile-SSH bindings for ‘ssh_get_fd’, which would allow
us to get at the actual file descriptor after the connection has been
opened, and to set TCP_NODELAY there:

  http://api.libssh.org/stable/group__libssh__session.html#gafe509fcea47714b5cd277d1e35e83276

It’s not 100% clear though if it returns that “right” FD when there’s a
proxy command.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Wed, 01 Jul 2020 11:26:01 GMT) Full text and rfc822 format available.

Message #36 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Wed, 1 Jul 2020 13:24:53 +0200
[Message part 1 (text/plain, inline)]
Hey,

> What we need is Guile-SSH bindings for ‘ssh_get_fd’, which would allow
> us to get at the actual file descriptor after the connection has been
> opened, and to set TCP_NODELAY there:
> 
>   http://api.libssh.org/stable/group__libssh__session.html#gafe509fcea47714b5cd277d1e35e83276
> 
> It’s not 100% clear though if it returns that “right” FD when there’s a
> proxy command.
wouldn’t it be easier to add the SSH_OPTIONS_NODELAY option (see [1]) to
guile-ssh (should be trivial[2]) and let guile-ssh handle all the connection
stuff without touching the socket? Or do we need the raw socket elsewhere?

Cheers,
Lars

[1] http://api.libssh.org/stable/group__libssh__session.html#ga7a801b85800baa3f4e16f5b47db0a73d
[2] https://github.com/artyom-poptsov/guile-ssh/blob/master/libguile-ssh/session-func.c#L52
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Wed, 01 Jul 2020 21:55:01 GMT) Full text and rfc822 format available.

Message #39 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Wed, 01 Jul 2020 23:53:54 +0200
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

>> What we need is Guile-SSH bindings for ‘ssh_get_fd’, which would allow
>> us to get at the actual file descriptor after the connection has been
>> opened, and to set TCP_NODELAY there:
>> 
>>   http://api.libssh.org/stable/group__libssh__session.html#gafe509fcea47714b5cd277d1e35e83276
>> 
>> It’s not 100% clear though if it returns that “right” FD when there’s a
>> proxy command.
> wouldn’t it be easier to add the SSH_OPTIONS_NODELAY option (see [1]) to
> guile-ssh (should be trivial[2]) and let guile-ssh handle all the connection
> stuff without touching the socket? Or do we need the raw socket elsewhere?

Good idea, I had overlooked that possibility.

Would you like to send them a patch?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Thu, 02 Jul 2020 07:01:01 GMT) Full text and rfc822 format available.

Message #42 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Thu, 2 Jul 2020 09:00:00 +0200
[Message part 1 (text/plain, inline)]
Hi,

> Would you like to send them a patch?
done: https://github.com/artyom-poptsov/guile-ssh/pull/21

Cheers,
Lars

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Thu, 02 Jul 2020 12:04:02 GMT) Full text and rfc822 format available.

Message #45 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Thu, 2 Jul 2020 14:03:33 +0200
[Message part 1 (text/plain, inline)]
Hi (again),

and the attached patch uses the new nodelay option, if we don’t want to wait
for another guile-ssh release.

Lars

[0001-guix-Add-nodelay-to-SSH-store-connection.patch (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 06 Jul 2020 08:50:02 GMT) Full text and rfc822 format available.

Message #48 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 06 Jul 2020 10:49:44 +0200
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

> and the attached patch uses the new nodelay option, if we don’t want to wait
> for another guile-ssh release.

I’d rather wait; perhaps you can ask Artyom whether they’re planning for
a new release soonish?

If there’s no plan for a release within two weeks, we can go ahead with
the patch you propose.

> From 2892f79f819dd2dd9420f7e74bcb6e293d377452 Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
> Date: Thu, 2 Jul 2020 13:59:51 +0200
> Subject: [PATCH] guix: Add nodelay to SSH store connection
>
> * gnu/packages/ssh.scm (guile-ssh)[patches]: Add patch.
> * gnu/packages/patches/guile-ssh-0.12.0-nodelay.patch: New file.
> * gnu/local.mk (dist_patch_DATA): Add it.
> * guix/ssh.scm (open-ssh-session): Use new nodelay option.

If we take this route, the two patches (guile-ssh changes, then (guix
ssh) changes) should be separate.

Also #:nodelay will cause a hard failure for people using an older
Guile-SSH.  I wonder how to guard against that; configure check, or
run-time Guile-SSH version check?

> +++ b/gnu/packages/patches/guile-ssh-0.12.0-nodelay.patch

It would be good to include the URL of the upstream issue

Thanks!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 06 Jul 2020 12:59:01 GMT) Full text and rfc822 format available.

Message #51 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 6 Jul 2020 14:58:00 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

> I’d rather wait; perhaps you can ask Artyom whether they’re planning for
> a new release soonish?
“I'll see if I can fix some random test failures with Guile 2.2 that sometimes
occur, and then I'll prepare a new release.”

> If there’s no plan for a release within two weeks, we can go ahead with
> the patch you propose.
Alright, I’ll prepare a new patch if either of the two events happen.

Cheers,
Lars

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 20 Jul 2020 09:52:02 GMT) Full text and rfc822 format available.

Message #54 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 20 Jul 2020 11:50:59 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

a new version has been released, so here is the updated patchset. I’m not sure
if it’s possible to make the #:nodelay optional somehow, i.e. retaining
compatibility with guile-ssh<0.13.0. For now there’s a hard feature check.

Lars

[0001-gnu-guile-ssh-Update-to-0.13.0.patch (text/x-diff, attachment)]
[0002-guix-Speed-up-RPC-via-SSH-connections.patch (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Mon, 20 Jul 2020 21:52:02 GMT) Full text and rfc822 format available.

Message #57 received at 41702 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: 41702 <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Mon, 20 Jul 2020 23:51:38 +0200
Hi Lars,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

> From 42ff75c9d4da4eb2aeece96119ae5a65a9ccac82 Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
> Date: Mon, 20 Jul 2020 11:27:35 +0200
> Subject: [PATCH 1/2] gnu: guile-ssh: Update to 0.13.0.
>
> * gnu/packages/ssh.scm (guile-ssh): Update to 0.13.0.

[...]

> From 6a18c9a4533075d1ee9795777a998fde3e64030f Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
> Date: Mon, 20 Jul 2020 11:28:51 +0200
> Subject: [PATCH 2/2] guix: Speed up RPC via SSH connections.
>
> See issue #41702
>
> * guix/ssh.scm (open-ssh-session): Enable #:nodelay.
> * m4/guix.m4 (GUIX_CHECK_GUILE_SSH): Add feature check for this new parameter.

I adjusted the commit log and pushed.

I think it’s okay to use the new Guile-SSH feature right away because
it’s not central, and ‘guix pull’ gives people the ability to get the
new Guile-SSH version.

Thanks!

Ludo’.




Reply sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
You have taken responsibility. (Thu, 22 Oct 2020 21:04:02 GMT) Full text and rfc822 format available.

Notification sent to Lars-Dominik Braun <ldb <at> leibniz-psychology.org>:
bug acknowledged by developer. (Thu, 22 Oct 2020 21:04:02 GMT) Full text and rfc822 format available.

Message #62 received at 41702-done <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41702-done <at> debbugs.gnu.org, Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Subject: Re: bug#41702: `guix environment` performance issues
Date: Thu, 22 Oct 2020 17:03:09 -0400
Hello,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi Lars,
>
> Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:
>
>> From 42ff75c9d4da4eb2aeece96119ae5a65a9ccac82 Mon Sep 17 00:00:00 2001
>> From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
>> Date: Mon, 20 Jul 2020 11:27:35 +0200
>> Subject: [PATCH 1/2] gnu: guile-ssh: Update to 0.13.0.
>>
>> * gnu/packages/ssh.scm (guile-ssh): Update to 0.13.0.
>
> [...]
>
>> From 6a18c9a4533075d1ee9795777a998fde3e64030f Mon Sep 17 00:00:00 2001
>> From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
>> Date: Mon, 20 Jul 2020 11:28:51 +0200
>> Subject: [PATCH 2/2] guix: Speed up RPC via SSH connections.
>>
>> See issue #41702
>>
>> * guix/ssh.scm (open-ssh-session): Enable #:nodelay.
>> * m4/guix.m4 (GUIX_CHECK_GUILE_SSH): Add feature check for this new parameter.
>
> I adjusted the commit log and pushed.
>
> I think it’s okay to use the new Guile-SSH feature right away because
> it’s not central, and ‘guix pull’ gives people the ability to get the
> new Guile-SSH version.

Judging from the above, it seems this issue has been resolved.

Closing.

Thank you,

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Fri, 23 Oct 2020 07:27:01 GMT) Full text and rfc822 format available.

Message #65 received at 41702-done <at> debbugs.gnu.org (full text, mbox):

From: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>,
 41702-done <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Fri, 23 Oct 2020 09:26:21 +0200
[Message part 1 (text/plain, inline)]
Hi Maxim,

> Judging from the above, it seems this issue has been resolved.
grafting is still a performance issue imo. Compare for example:

$ time guix environment --ad-hoc  --search-paths r-learnr
guix environment --ad-hoc --search-paths r-learnr  5,90s user 0,09s system 210% cpu 2,844 total
$ time guix environment --ad-hoc  --search-paths r-learnr --no-grafts
guix environment --ad-hoc --search-paths r-learnr --no-grafts  2,03s user 0,08s system 164% cpu 1,277 total

Cheers,
Lars

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#41702; Package guix. (Fri, 23 Oct 2020 14:52:01 GMT) Full text and rfc822 format available.

Message #68 received at 41702-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Lars-Dominik Braun <ldb <at> leibniz-psychology.org>
Cc: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>, 41702-done <at> debbugs.gnu.org
Subject: Re: bug#41702: `guix environment` performance issues
Date: Fri, 23 Oct 2020 16:51:34 +0200
Hi,

Lars-Dominik Braun <ldb <at> leibniz-psychology.org> skribis:

> grafting is still a performance issue imo. Compare for example:

Agreed.  The fix in 58bb833365db4e8934a386497d5b00a063cfd27d is
incomplete: we’re still potentially doing things several times.

> $ time guix environment --ad-hoc  --search-paths r-learnr
> guix environment --ad-hoc --search-paths r-learnr  5,90s user 0,09s system 210% cpu 2,844 total
> $ time guix environment --ad-hoc  --search-paths r-learnr --no-grafts
> guix environment --ad-hoc --search-paths r-learnr --no-grafts  2,03s user 0,08s system 164% cpu 1,277 total

Here’s what I get over a local socket:

--8<---------------cut here---------------start------------->8---
ludo <at> ribbon ~/src/guix$ time guix environment --ad-hoc  --search-paths r-learnr --no-grafts
export PATH="/gnu/store/wkhd1fnmg1q30g21bfn5jprz24m041w6-profile/bin${PATH:+:}$PATH"

real	0m1.790s
user	0m1.928s
sys	0m0.102s
ludo <at> ribbon ~/src/guix$ time guix environment --ad-hoc  --search-paths r-learnr
export PATH="/gnu/store/pwzrpzdbli2561f2yq70kljywn93h5w6-profile/bin${PATH:+:}$PATH"

real	0m3.820s
user	0m4.954s
sys	0m0.105s
--8<---------------cut here---------------end--------------->8---

Ludo’.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Oct 2020 14:53:02 GMT) Full text and rfc822 format available.

Removed tag(s) patch. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 23 Oct 2020 14:53:02 GMT) Full text and rfc822 format available.

Merged 41702 48005. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 27 May 2021 16:33:01 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 331 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.