GNU bug report logs - #47283
Performance regression in narinfo fetching

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Sat, 20 Mar 2021 17:39:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 47283 in the body.
You can then email your comments to 47283 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to mail <at> cbaines.net, bug-guix <at> gnu.org:
bug#47283; Package guix. (Sat, 20 Mar 2021 17:39:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to mail <at> cbaines.net, bug-guix <at> gnu.org. (Sat, 20 Mar 2021 17:39:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: <bug-guix <at> gnu.org>
Subject: Performance regression in narinfo fetching
Date: Sat, 20 Mar 2021 18:38:39 +0100
[Message part 1 (text/plain, inline)]
Hello!

As reported on guix-devel, ‘guix weather’ has become extremely slow.
Specifically, in the narinfo-fetching phase, it runs at 100% CPU, even
though that part should be network-bound (pipelined HTTP GETs).

A profile of the ‘report-server-coverage’ call would show this:

--8<---------------cut here---------------start------------->8---
%     cumulative   self             
time   seconds     seconds  procedure
 62.50      1.06      1.06  fluid-ref*
  6.25      0.11      0.11  regexp-exec
  3.13      0.05      0.05  ice-9/boot-9.scm:1738:4:throw
  2.08      0.04      0.04  string-index
  2.08      0.04      0.04  write
  1.04    568.08      0.02  ice-9/boot-9.scm:1673:4:with-exception-handler
  1.04      0.02      0.02  %read-line
  1.04      0.02      0.02  guix/ci.scm:78:0:json->build
  1.04      0.02      0.02  string-append
--8<---------------cut here---------------end--------------->8---

More than half of the time spent in ‘fluid-ref*’—sounds fishy.

Where does that that call come from?  There seems to be a single caller,
in boot-9.scm:

    (define* (raise-exception exn #:key (continuable? #f))
      (define (capture-current-exception-handlers)
        ;; FIXME: This is quadratic.
        (let lp ((depth 0))
          (let ((h (fluid-ref* %exception-handler depth)))
            (if h
                (cons h (lp (1+ depth)))
                (list fallback-exception-handler)))))
      ;; …
      )

We must be abusing exceptions somewhere…

Indeed, there’s one place on the hot path where we install exception
handlers: in ‘http-multiple-get’ (from commit
205833b72c5517915a47a50dbe28e7024dc74e57).  I don’t think it’s needed,
is it?  (But if it is, let’s find another approach, this one is
prohibitively expensive.)

A simple performance test is:

  rm -rf ~/.cache/guix/substitute/
  time ./pre-inst-env guix weather $(guix package -A|head -500| cut -f1)

After removing this ‘catch’ in ‘http-multiple-get’, the profile is
flatter:

--8<---------------cut here---------------start------------->8---
%     cumulative   self             
time   seconds     seconds  procedure
  8.33      0.07      0.07  string-index
  8.33      0.07      0.07  regexp-exec
  5.56      0.05      0.05  anon #x154af88
  5.56      0.05      0.05  write
  5.56      0.05      0.05  string-tokenize
  5.56      0.05      0.05  read-char
  5.56      0.05      0.05  set-certificate-credentials-x509-trust-data!
  5.56      0.05      0.05  %read-line
--8<---------------cut here---------------end--------------->8---

There’s also this ‘call-with-connection-error-handling’ call in (guix
substitute), around an ‘http-multiple-get’ call, that may not be
justified.

Attached is a diff of the tweaks I made to test this.

WDYT, Chris?

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/guix/http-client.scm b/guix/http-client.scm
index 4b4c14ed0b..a28523201e 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -219,42 +219,21 @@ returning."
              (remainder
               (connect p remainder result))))
           ((head tail ...)
-           (catch #t
-             (lambda ()
-               (let* ((resp   (read-response p))
-                      (body   (response-body-port resp))
-                      (result (proc head resp body result)))
-                 ;; The server can choose to stop responding at any time,
-                 ;; in which case we have to try again.  Check whether
-                 ;; that is the case.  Note that even upon "Connection:
-                 ;; close", we can read from BODY.
-                 (match (assq 'connection (response-headers resp))
-                   (('connection 'close)
-                    (close-port p)
-                    (connect #f                       ;try again
-                             (drop requests (+ 1 processed))
-                             result))
-                   (_
-                    (loop tail (+ 1 processed) result))))) ;keep going
-             (lambda (key . args)
-               ;; If PORT was cached and the server closed the connection
-               ;; in the meantime, we get EPIPE.  In that case, open a
-               ;; fresh connection and retry.  We might also get
-               ;; 'bad-response or a similar exception from (web response)
-               ;; later on, once we've sent the request, or a
-               ;; ERROR/INVALID-SESSION from GnuTLS.
-               (if (or (and (eq? key 'system-error)
-                            (= EPIPE (system-error-errno `(,key ,@args))))
-                       (and (eq? key 'gnutls-error)
-                            (eq? (first args) error/invalid-session))
-                       (memq key
-                             '(bad-response bad-header bad-header-component)))
-                   (begin
-                     (close-port p)
-                     (connect #f      ; try again
-                              (drop requests (+ 1 processed))
-                              result))
-                   (apply throw key args))))))))))
+           (let* ((resp   (read-response p))
+                  (body   (response-body-port resp))
+                  (result (proc head resp body result)))
+             ;; The server can choose to stop responding at any time,
+             ;; in which case we have to try again.  Check whether
+             ;; that is the case.  Note that even upon "Connection:
+             ;; close", we can read from BODY.
+             (match (assq 'connection (response-headers resp))
+               (('connection 'close)
+                (close-port p)
+                (connect #f                       ;try again
+                         (drop requests (+ 1 processed))
+                         result))
+               (_
+                (loop tail (+ 1 processed) result))))))))))
 
 
 ;;;
diff --git a/guix/scripts/weather.scm b/guix/scripts/weather.scm
index 5164fe0494..3d8d50d5e3 100644
--- a/guix/scripts/weather.scm
+++ b/guix/scripts/weather.scm
@@ -184,9 +184,10 @@ Return the coverage ratio, an exact number between 0 and 1."
   (let/time ((time narinfos requests-made
                    (lookup-narinfos
                     server items
-                    #:make-progress-reporter
-                    (lambda* (total #:key url #:allow-other-keys)
-                      (progress-reporter/bar total)))))
+                    ;; #:make-progress-reporter
+                    ;; (lambda* (total #:key url #:allow-other-keys)
+                    ;;   (progress-reporter/bar total))
+                    )))
     (format #t "~a~%" server)
     (let ((obtained  (length narinfos))
           (requested (length items))
@@ -504,6 +505,7 @@ SERVER.  Display information for packages with at least THRESHOLD dependents."
 ;;; Entry point.
 ;;;
 
+(use-modules (statprof))
 (define-command (guix-weather . args)
   (synopsis "report on the availability of pre-built package binaries")
 
@@ -551,9 +553,11 @@ SERVER.  Display information for packages with at least THRESHOLD dependents."
         (exit
          (every* (lambda (server)
                    (define coverage
-                     (report-server-coverage server items
-                                             #:display-missing?
-                                             (assoc-ref opts 'display-missing?)))
+                     (statprof
+                      (lambda ()
+                       (report-server-coverage server items
+                                               #:display-missing?
+                                               (assoc-ref opts 'display-missing?)))))
                    (match (assoc-ref opts 'coverage)
                      (#f #f)
                      (threshold
diff --git a/guix/substitutes.scm b/guix/substitutes.scm
index 08f8c24efd..04bf70caaa 100644
--- a/guix/substitutes.scm
+++ b/guix/substitutes.scm
@@ -59,8 +59,6 @@
   #:use-module (guix http-client)
   #:export (%narinfo-cache-directory
 
-            call-with-connection-error-handling
-
             lookup-narinfos
             lookup-narinfos/diverse))
 
@@ -235,14 +233,11 @@ if file doesn't exist, and the narinfo otherwise."
        (let* ((requests (map (cut narinfo-request url <>) paths))
               (result   (begin
                           (start-progress-reporter! progress-reporter)
-                          (call-with-connection-error-handling
-                           uri
-                           (lambda ()
-                             (http-multiple-get uri
-                                                handle-narinfo-response '()
-                                                requests
-                                                #:open-connection open-connection
-                                                #:verify-certificate? #f))))))
+                          (http-multiple-get uri
+                                             handle-narinfo-response '()
+                                             requests
+                                             #:open-connection open-connection
+                                             #:verify-certificate? #f))))
          (stop-progress-reporter! progress-reporter)
          result))
       ((file #f)

Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Sat, 20 Mar 2021 20:33:01 GMT) Full text and rfc822 format available.

Message #8 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 47283 <at> debbugs.gnu.org
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Sat, 20 Mar 2021 20:32:35 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> As reported on guix-devel, ‘guix weather’ has become extremely slow.
> Specifically, in the narinfo-fetching phase, it runs at 100% CPU, even
> though that part should be network-bound (pipelined HTTP GETs).
>
> A profile of the ‘report-server-coverage’ call would show this:
>
> --8<---------------cut here---------------start------------->8---
> %     cumulative   self             
> time   seconds     seconds  procedure
>  62.50      1.06      1.06  fluid-ref*
>   6.25      0.11      0.11  regexp-exec
>   3.13      0.05      0.05  ice-9/boot-9.scm:1738:4:throw
>   2.08      0.04      0.04  string-index
>   2.08      0.04      0.04  write
>   1.04    568.08      0.02  ice-9/boot-9.scm:1673:4:with-exception-handler
>   1.04      0.02      0.02  %read-line
>   1.04      0.02      0.02  guix/ci.scm:78:0:json->build
>   1.04      0.02      0.02  string-append
> --8<---------------cut here---------------end--------------->8---
>
> More than half of the time spent in ‘fluid-ref*’—sounds fishy.
>
> Where does that that call come from?  There seems to be a single caller,
> in boot-9.scm:
>
>     (define* (raise-exception exn #:key (continuable? #f))
>       (define (capture-current-exception-handlers)
>         ;; FIXME: This is quadratic.
>         (let lp ((depth 0))
>           (let ((h (fluid-ref* %exception-handler depth)))
>             (if h
>                 (cons h (lp (1+ depth)))
>                 (list fallback-exception-handler)))))
>       ;; …
>       )
>
> We must be abusing exceptions somewhere…
>
> Indeed, there’s one place on the hot path where we install exception
> handlers: in ‘http-multiple-get’ (from commit
> 205833b72c5517915a47a50dbe28e7024dc74e57).  I don’t think it’s needed,
> is it?  (But if it is, let’s find another approach, this one is
> prohibitively expensive.)

I think the exception handling has moved around, but I guess the
exceptions that could be caught in http-multiple-get could happen,
right? I am really just guessing here, as Guile doesn't help tell you
about possible exceptions, and I haven't spent enough time to read all
the possible code involved to find out if these are definitely possible.

> A simple performance test is:
>
>   rm -rf ~/.cache/guix/substitute/
>   time ./pre-inst-env guix weather $(guix package -A|head -500| cut -f1)
>
> After removing this ‘catch’ in ‘http-multiple-get’, the profile is
> flatter:
>
> --8<---------------cut here---------------start------------->8---
> %     cumulative   self
> time   seconds     seconds  procedure
>   8.33      0.07      0.07  string-index
>   8.33      0.07      0.07  regexp-exec
>   5.56      0.05      0.05  anon #x154af88
>   5.56      0.05      0.05  write
>   5.56      0.05      0.05  string-tokenize
>   5.56      0.05      0.05  read-char
>   5.56      0.05      0.05  set-certificate-credentials-x509-trust-data!
>   5.56      0.05      0.05  %read-line
> --8<---------------cut here---------------end--------------->8---
>
> There’s also this ‘call-with-connection-error-handling’ call in (guix
> substitute), around an ‘http-multiple-get’ call, that may not be
> justified.
>
> Attached is a diff of the tweaks I made to test this.
>
> WDYT, Chris?

I haven't looked in to this yet, but maybe it would be possible to
adjust the code so that it doesn't perform so badly, but still tries to
handle possible exceptions.

The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
maybe that would perform better, or stop using let for iteration and
setup the exception handling, then process each request, using set! to
update the state. I haven't tested either of these.

It's good to know that Guile exception handling can be excessively
expensive though, I wouldn't have expected it to beat out anything over
the network in terms of the performance penalty.
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 20 Mar 2021 20:55:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Sun, 21 Mar 2021 00:49:02 GMT) Full text and rfc822 format available.

Message #13 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: 47283 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Sun, 21 Mar 2021 00:48:03 +0000
[Message part 1 (text/plain, inline)]
Christopher Baines <mail <at> cbaines.net> writes:

> I haven't looked in to this yet, but maybe it would be possible to
> adjust the code so that it doesn't perform so badly, but still tries to
> handle possible exceptions.
>
> The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
> maybe that would perform better, or stop using let for iteration and
> setup the exception handling, then process each request, using set! to
> update the state. I haven't tested either of these.

I tried something, neither of these things, but just not calling (loop
...) within the catch block. I don't know why this might work, but it
seems to make guix weather much faster.

Here's the patch [1], I've just realised it's broken, as it'll loose the
result value (and use an old one) when the connection is closed. I'll
send a updated patch without this issue in a moment.

1: https://issues.guix.gnu.org/47288
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Sun, 21 Mar 2021 21:11:02 GMT) Full text and rfc822 format available.

Message #16 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 47283 <at> debbugs.gnu.org
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Sun, 21 Mar 2021 22:10:40 +0100
Hi!

Christopher Baines <mail <at> cbaines.net> skribis:

> Christopher Baines <mail <at> cbaines.net> writes:
>
>> I haven't looked in to this yet, but maybe it would be possible to
>> adjust the code so that it doesn't perform so badly, but still tries to
>> handle possible exceptions.
>>
>> The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
>> maybe that would perform better, or stop using let for iteration and
>> setup the exception handling, then process each request, using set! to
>> update the state. I haven't tested either of these.
>
> I tried something, neither of these things, but just not calling (loop
> ...) within the catch block. I don't know why this might work, but it
> seems to make guix weather much faster.

Oh yes, that’s also because calling ‘loop’ from within ‘catch’ made it a
non-tail call, so we kept accumulating exception handlers, and the ‘lp’
loop in ‘raise-exception’ would have an ever increasing list of handlers
to traverse.

> Here's the patch [1], I've just realised it's broken, as it'll loose the
> result value (and use an old one) when the connection is closed. I'll
> send a updated patch without this issue in a moment.
>
> 1: https://issues.guix.gnu.org/47288

OK, thanks.  I’ll reply to your other message first.  :-)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Sun, 21 Mar 2021 21:23:01 GMT) Full text and rfc822 format available.

Message #19 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 47283 <at> debbugs.gnu.org
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Sun, 21 Mar 2021 22:22:10 +0100
[Message part 1 (text/plain, inline)]
Christopher Baines <mail <at> cbaines.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Indeed, there’s one place on the hot path where we install exception
>> handlers: in ‘http-multiple-get’ (from commit
>> 205833b72c5517915a47a50dbe28e7024dc74e57).  I don’t think it’s needed,
>> is it?  (But if it is, let’s find another approach, this one is
>> prohibitively expensive.)
>
> I think the exception handling has moved around, but I guess the
> exceptions that could be caught in http-multiple-get could happen,
> right? I am really just guessing here, as Guile doesn't help tell you
> about possible exceptions, and I haven't spent enough time to read all
> the possible code involved to find out if these are definitely possible.

Yeah.

Commit 205833b72c5517915a47a50dbe28e7024dc74e57 added a ‘catch’ block
that catches the same things as ‘with-cached-connection’ did (it would
be better to not duplicate it IMO).  That includes EPIPE, gnutls-error,
bad-response & co.

Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
Reuse connections for '--query'.”) did not add such a ‘catch’ block in
‘http-multiple-get’.  Instead, it wrapped its call in ‘do-fetch’ in
‘fetch-narinfos’:

[Message part 2 (text/x-patch, inline)]
   (define (do-fetch uri)
     (case (and=> uri uri-scheme)
       ((http https)
-       (let ((requests (map (cut narinfo-request url <>) paths)))
-         (match (open-connection-for-uri/maybe uri)
-           (#f
-            '())
-           (port
-            (update-progress!)
        ;; Note: Do not check HTTPS server certificates to avoid depending
        ;; on the X.509 PKI.  We can do it because we authenticate
        ;; narinfos, which provides a much stronger guarantee.
-            (let ((result (http-multiple-get uri
+       (let* ((requests (map (cut narinfo-request url <>) paths))
+              (result   (call-with-cached-connection uri
+                          (lambda (port)
+                            (if port
+                                (begin
+                                  (update-progress!)
+                                  (http-multiple-get uri
                                                      handle-narinfo-response '()
                                                      requests
+                                                     #:open-connection
+                                                     open-connection-for-uri/cached
                                                      #:verify-certificate? #f
-                                             #:port port)))
[Message part 3 (text/plain, inline)]
This bit is still there in current ‘master’, so I think it’s not
necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
would just remove the ‘catch’ wrap altogether.

WDYT?

Thanks,
Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Tue, 23 Mar 2021 20:48:02 GMT) Full text and rfc822 format available.

Message #22 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 47283 <at> debbugs.gnu.org
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Tue, 23 Mar 2021 20:47:12 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Christopher Baines <mail <at> cbaines.net> skribis:
>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>
>>> Indeed, there’s one place on the hot path where we install exception
>>> handlers: in ‘http-multiple-get’ (from commit
>>> 205833b72c5517915a47a50dbe28e7024dc74e57).  I don’t think it’s needed,
>>> is it?  (But if it is, let’s find another approach, this one is
>>> prohibitively expensive.)
>>
>> I think the exception handling has moved around, but I guess the
>> exceptions that could be caught in http-multiple-get could happen,
>> right? I am really just guessing here, as Guile doesn't help tell you
>> about possible exceptions, and I haven't spent enough time to read all
>> the possible code involved to find out if these are definitely possible.
>
> Yeah.
>
> Commit 205833b72c5517915a47a50dbe28e7024dc74e57 added a ‘catch’ block
> that catches the same things as ‘with-cached-connection’ did (it would
> be better to not duplicate it IMO).  That includes EPIPE, gnutls-error,
> bad-response & co.

So, my intention here was to move the error handling, to allow
separating out the connection caching code from the code I wanted to
move out to the (guix substitutes) module. I don't think there's
currently duplication in the error handling for the code path involving
http-multiple-get currently, at least for the exceptions in question
here.

> Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
> Reuse connections for '--query'.”) did not add such a ‘catch’ block in
> ‘http-multiple-get’.  Instead, it wrapped its call in ‘do-fetch’ in
> ‘fetch-narinfos’:
>
>    (define (do-fetch uri)
>      (case (and=> uri uri-scheme)
>        ((http https)
> -       (let ((requests (map (cut narinfo-request url <>) paths)))
> -         (match (open-connection-for-uri/maybe uri)
> -           (#f
> -            '())
> -           (port
> -            (update-progress!)
>         ;; Note: Do not check HTTPS server certificates to avoid depending
>         ;; on the X.509 PKI.  We can do it because we authenticate
>         ;; narinfos, which provides a much stronger guarantee.
> -            (let ((result (http-multiple-get uri
> +       (let* ((requests (map (cut narinfo-request url <>) paths))
> +              (result   (call-with-cached-connection uri
> +                          (lambda (port)
> +                            (if port
> +                                (begin
> +                                  (update-progress!)
> +                                  (http-multiple-get uri
>                                                       handle-narinfo-response '()
>                                                       requests
> +                                                     #:open-connection
> +                                                     open-connection-for-uri/cached
>                                                       #:verify-certificate? #f
> -                                             #:port port)))
>
> This bit is still there in current ‘master’, so I think it’s not
> necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
> would just remove the ‘catch’ wrap altogether.
>
> WDYT?

I'm not sure what you're referring to as still being there on the master
branch?

Looking at the changes to this particular code path resulting from the
changes I've made recently, starting at lookup-narinfos, before:

 - lookup-narinfos calls fetch-narinfos, which calls do-fetch

 - call-with-cached-connection is used, which catches a number of
   exceptions relating to requests, and will retry PROC once upon a
   matching exception

 - open-connection-for-uri/maybe is also used, which is like
   open-connection-for-uri/cached, except it includes error handling for
   establishing connections to substitute servers

 - http-multiple-get doesn't include error handling

After:

 - lookup-narinfos calls fetch-narinfos, which calls do-fetch

 - call-with-connection-error-handling is used, which performs the same
   role as the error handling previously within
   open-connection-for-uri/maybe, catching exceptions relating to
   establishing connections to substitute servers

 - http-multiple-get now includes error handling similar to what was
   previously done by call-with-cached-connection, although it's more
   complicated since it's done with knowledge of what http-multiple-get
   is doing

I think that the error handling now in http-multiple-get isn't covered
elsewhere. Moving this error handling back in to fetch-narinfos is
possible, but then we'd be back to handling connection caching in that
code, and avoiding that led to this refactoring in the first place.

Also, apart from the implementation problems, I do think that the error
handling here is better than before. Previously, if you called
lookup-narinfos, and a connection problem occurred, processing all the
requests would start from scratch (as call-with-cached-connection calls
PROC a second time), if a second connection error was to happen, well,
call-with-cached-connection only handles one error, so that won't be
caught.

I think it's possible that http-multiple-get will be making thousands of
requests, running guix weather with no cached results for example. The
error handling in http-multiple-get is smarter than the previous
approach, doing as little as possible again. It's also not limited to
catching one exception.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#47283; Package guix. (Wed, 24 Mar 2021 14:52:01 GMT) Full text and rfc822 format available.

Message #25 received at 47283 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 47283 <at> debbugs.gnu.org
Subject: Re: bug#47283: Performance regression in narinfo fetching
Date: Wed, 24 Mar 2021 15:51:47 +0100
Hi Chris,

Christopher Baines <mail <at> cbaines.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

>> Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
>> Reuse connections for '--query'.”) did not add such a ‘catch’ block in
>> ‘http-multiple-get’.  Instead, it wrapped its call in ‘do-fetch’ in
>> ‘fetch-narinfos’:
>>
>>    (define (do-fetch uri)
>>      (case (and=> uri uri-scheme)
>>        ((http https)
>> -       (let ((requests (map (cut narinfo-request url <>) paths)))
>> -         (match (open-connection-for-uri/maybe uri)
>> -           (#f
>> -            '())
>> -           (port
>> -            (update-progress!)
>>         ;; Note: Do not check HTTPS server certificates to avoid depending
>>         ;; on the X.509 PKI.  We can do it because we authenticate
>>         ;; narinfos, which provides a much stronger guarantee.
>> -            (let ((result (http-multiple-get uri
>> +       (let* ((requests (map (cut narinfo-request url <>) paths))
>> +              (result   (call-with-cached-connection uri
>> +                          (lambda (port)
>> +                            (if port
>> +                                (begin
>> +                                  (update-progress!)
>> +                                  (http-multiple-get uri
>>                                                       handle-narinfo-response '()
>>                                                       requests
>> +                                                     #:open-connection
>> +                                                     open-connection-for-uri/cached
>>                                                       #:verify-certificate? #f
>> -                                             #:port port)))
>>
>> This bit is still there in current ‘master’, so I think it’s not
>> necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
>> would just remove the ‘catch’ wrap altogether.
>>
>> WDYT?
>
> I'm not sure what you're referring to as still being there on the master
> branch?

On ‘master’, ‘do-fetch’ has its ‘http-multiple-get’ call wrapped in
‘call-with-connection-error-handling’, which is equivalent to the change
made in be5a75ebb5988b87b2392e2113f6590f353dd6cd and shown above.

> Looking at the changes to this particular code path resulting from the
> changes I've made recently, starting at lookup-narinfos, before:
>
>  - lookup-narinfos calls fetch-narinfos, which calls do-fetch
>
>  - call-with-cached-connection is used, which catches a number of
>    exceptions relating to requests, and will retry PROC once upon a
>    matching exception
>
>  - open-connection-for-uri/maybe is also used, which is like
>    open-connection-for-uri/cached, except it includes error handling for
>    establishing connections to substitute servers
>
>  - http-multiple-get doesn't include error handling
>
> After:
>
>  - lookup-narinfos calls fetch-narinfos, which calls do-fetch
>
>  - call-with-connection-error-handling is used, which performs the same
>    role as the error handling previously within
>    open-connection-for-uri/maybe, catching exceptions relating to
>    establishing connections to substitute servers
>
>  - http-multiple-get now includes error handling similar to what was
>    previously done by call-with-cached-connection, although it's more
>    complicated since it's done with knowledge of what http-multiple-get
>    is doing
>
> I think that the error handling now in http-multiple-get isn't covered
> elsewhere. Moving this error handling back in to fetch-narinfos is
> possible, but then we'd be back to handling connection caching in that
> code, and avoiding that led to this refactoring in the first place.

The ‘http-multiple-get’ call in ‘fetch-narinfos’ is already wrapped in
‘call-with-connection-error-handling’, so it seems we’re good?

> Also, apart from the implementation problems, I do think that the error
> handling here is better than before. Previously, if you called
> lookup-narinfos, and a connection problem occurred, processing all the
> requests would start from scratch (as call-with-cached-connection calls
> PROC a second time), if a second connection error was to happen, well,
> call-with-cached-connection only handles one error, so that won't be
> caught.

Hmm true.  However, can that second exception happen?  Normally, if we
get a first exception, we open a new connection and that one should not
get another exception, unless something is wrong—in which case it’s
probably best to report it than to endlessly retry.

WDYT?

Even in tail call position, ‘catch’ calls introduce allocations and
extra work, so if we can avoid using one ‘catch’ per iteration, that’s
better.

Thank you,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 27 Mar 2021 21:59:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Sat, 27 Mar 2021 21:59:02 GMT) Full text and rfc822 format available.

Message #30 received at 47283-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 47283-done <at> debbugs.gnu.org, 47288-done <at> debbugs.gnu.org
Subject: Re: bug#47288: [PATCH] guix: http-client: Tweak http-multiple-get
 error handling.
Date: Sat, 27 Mar 2021 22:58:39 +0100
Hi!

I went ahead and pushed 45fce38fb0b6c6796906149ade145b8d3594c1c6 along
these lines.

‘guix weather’ runs to completion and things seem to work fine.  Let me
know if you notice anything wrong.

Thank you for your work and for your patience on this issue!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 25 Apr 2021 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 360 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.