GNU bug report logs - #39728
[PATCH] Allow parallel downloads and builds

Previous Next

Package: guix-patches;

Reported by: Julien Lepiller <julien <at> lepiller.eu>

Date: Fri, 21 Feb 2020 22:54:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 39728 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Fri, 21 Feb 2020 22:54:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julien Lepiller <julien <at> lepiller.eu>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Fri, 21 Feb 2020 22:54:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: guix-patches <at> gnu.org
Subject: [PATCH] Allow parallel downloads and builds
Date: Fri, 21 Feb 2020 23:53:07 +0100
[Message part 1 (text/plain, inline)]
Hi guix!

This patch allows to count builds and downloads separately. The idea is
that downloads need bandwidth, but no CPU, while builds do not need
bandwidth, but need CPU. With this patch, guix will be able to download
substitutes while building unrelated packages. Currently, guix needs to
wait for the download to finish before proceeding to the build. This
should reduce the time of guix commands that need to build and download
things at the same time.

What do you think?
[0001-nix-Count-build-and-download-jobs-separately.patch (text/x-patch, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Mon, 24 Feb 2020 21:24:02 GMT) Full text and rfc822 format available.

Message #8 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 39728 <at> debbugs.gnu.org
Subject: Re: [bug#39728] [PATCH] Allow parallel downloads and builds
Date: Mon, 24 Feb 2020 22:23:45 +0100
Hi!

Julien Lepiller <julien <at> lepiller.eu> skribis:

> This patch allows to count builds and downloads separately. The idea is
> that downloads need bandwidth, but no CPU, while builds do not need
> bandwidth, but need CPU. With this patch, guix will be able to download
> substitutes while building unrelated packages. Currently, guix needs to
> wait for the download to finish before proceeding to the build. This
> should reduce the time of guix commands that need to build and download
> things at the same time.
>
> What do you think?

I think it’s a good idea!

I wonder what the UI will look like: (guix status) would no longer
display a progress bar when there’s more than on job (build or download)
taking place at the same time.

>>From 9c059d81ba4f4016f8c400b403f8c5edbdb160c2 Mon Sep 17 00:00:00 2001
> From: Julien Lepiller <julien <at> lepiller.eu>
> Date: Fri, 21 Feb 2020 23:41:33 +0100
> Subject: [PATCH] nix: Count build and download jobs separately.
                   ^
I’d write “daemon:” here.  :-)

> This allows to run downloads (that take bandwith) and builds (that take
             ^
“This allows us”

> CPU time) independently from one another.
>
> * nix/nix-daemon/guix-daemon.cc: Add a max-download-jobs option.
> * nix/libstore/globals.hh: Add a maxDownloadJobs setting.
> * nix/libstore/globals.cc: Add a default value to it.
> * nix/libstore/build.cc: Manage build and download jobs separately.

For the final patch, please specify the entities changed (classes,
functions, etc.).

> +    /* Number of download slots occupied.  This includes substitution and
> +       built-ins. */
> +    unsigned int nrDownloads;

Note that not all builtins are downloads.  Fixed-output derivations are
(usually) also downloads.

(It’d be the first time the daemon gets a notion of “download”.  We
should make sure it doesn’t conflict with other assumptions.)

>      /* Registers a running child process.  `inBuildSlot' means that
>         the process counts towards the jobs limit. */
>      void childStarted(GoalPtr goal, pid_t pid,
> -        const set<int> & fds, bool inBuildSlot, bool respectTimeouts);
> +        const set<int> & fds, bool inBuildSlot, bool inDownloadSlot,
> +        bool respectTimeouts);

How about replacing these two Booleans by a single enum?

> +    unsigned int curDownloads = worker.getNrDownloads();
> +    if (curDownloads >= (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs) &&
> +                    fixedOutput) {

This is hard to parse and lacking spacing.  :-)  Perhaps make an
intermediate function or variable?

> +void Worker::waitForDownloadSlot(GoalPtr goal)
> +{
> +    debug("wait for download slot");
> +    if (getNrDownloads() < (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs))

Same here.

> @@ -118,6 +119,7 @@ void Settings::update()
>  {
>      _get(tryFallback, "build-fallback");
>      _get(maxBuildJobs, "build-max-jobs");
> +    _get(maxDownloadJobs, "download-max-jobs");

We should also allow ‘set-build-options’ to set this option, as well as
add it to ‘%standard-build-options’.  That can prolly come in a separate
patch.

> +    { "max-downloads", 'D', n_("N"), 0,
> +      n_("allow at most N download jobs") },

We’d need to update doc/guix.texi.

It would be great if you could test this patch for your daily usage.  I
find it surprisingly easy to break things in the daemon.  :-)

Thank you!

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Tue, 25 Feb 2020 15:22:02 GMT) Full text and rfc822 format available.

Message #11 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Julien Lepiller <julien <at> lepiller.eu>, 39728 <at> debbugs.gnu.org
Subject: Re: [bug#39728] [PATCH] Allow parallel downloads and builds
Date: Tue, 25 Feb 2020 16:21:24 +0100
Hi Julien,

On Mon, 24 Feb 2020 at 22:43, Ludovic Courtès <ludo <at> gnu.org> wrote:
> Julien Lepiller <julien <at> lepiller.eu> skribis:

> > This patch allows to count builds and downloads separately. The idea is
> > that downloads need bandwidth, but no CPU, while builds do not need
> > bandwidth, but need CPU. With this patch, guix will be able to download
> > substitutes while building unrelated packages. Currently, guix needs to
> > wait for the download to finish before proceeding to the build. This
> > should reduce the time of guix commands that need to build and download
> > things at the same time.
> >
> > What do you think?
>
> I think it’s a good idea!
>
> I wonder what the UI will look like: (guix status) would no longer
> display a progress bar when there’s more than on job (build or download)
> taking place at the same time.

Speaking about progress bar, it could be nice (as an improvement) to
have a concurrent progress bar. As an example, see:

http://hackage.haskell.org/package/concurrent-output


> It would be great if you could test this patch for your daily usage.  I
> find it surprisingly easy to break things in the daemon.  :-)

How can I do that?
After the 'make', how can change the daemon? And then revert it again
to the default one?


Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Tue, 25 Feb 2020 15:40:01 GMT) Full text and rfc822 format available.

Message #14 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: zimoun <zimon.toutoune <at> gmail.com>,
 Ludovic Courtès <ludo <at> gnu.org>
Cc: 39728 <at> debbugs.gnu.org
Subject: Re: [bug#39728] [PATCH] Allow parallel downloads and builds
Date: Tue, 25 Feb 2020 10:39:17 -0500
Le 25 février 2020 10:21:24 GMT-05:00, zimoun <zimon.toutoune <at> gmail.com> a écrit :
>Hi Julien,
>
>On Mon, 24 Feb 2020 at 22:43, Ludovic Courtès <ludo <at> gnu.org> wrote:
>> Julien Lepiller <julien <at> lepiller.eu> skribis:
>
>> > This patch allows to count builds and downloads separately. The
>idea is
>> > that downloads need bandwidth, but no CPU, while builds do not need
>> > bandwidth, but need CPU. With this patch, guix will be able to
>download
>> > substitutes while building unrelated packages. Currently, guix
>needs to
>> > wait for the download to finish before proceeding to the build.
>This
>> > should reduce the time of guix commands that need to build and
>download
>> > things at the same time.
>> >
>> > What do you think?
>>
>> I think it’s a good idea!
>>
>> I wonder what the UI will look like: (guix status) would no longer
>> display a progress bar when there’s more than on job (build or
>download)
>> taking place at the same time.
>
>Speaking about progress bar, it could be nice (as an improvement) to
>have a concurrent progress bar. As an example, see:
>
>http://hackage.haskell.org/package/concurrent-output
>
>
>> It would be great if you could test this patch for your daily usage. 
>I
>> find it surprisingly easy to break things in the daemon.  :-)
>
>How can I do that?
>After the 'make', how can change the daemon? And then revert it again
>to the default one?
>
>
>Cheers,
>simon

On the guix system, try (in a guix environment guix) sudo herd stop guix-daemon; sudo ./pre-inst-env guix-daemon --build-users-group=guixbuild 

To revert back, kill this (^C) and sudo herd start guix-daemon.




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Wed, 26 Feb 2020 10:37:02 GMT) Full text and rfc822 format available.

Message #17 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Pierre Neidhardt <mail <at> ambrevar.xyz>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39728 <at> debbugs.gnu.org,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39728] [PATCH] Allow parallel downloads and builds
Date: Wed, 26 Feb 2020 11:36:32 +0100
[Message part 1 (text/plain, inline)]
Thank you so much for this, Julien!

There was a thread on this topic on the mailing list:
https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00002.html
(could not find the first email :p).

There were a couple other issues that were mentioned there.
This patch would be a first step towards more parallelization!

-- 
Pierre Neidhardt
https://ambrevar.xyz/
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Wed, 14 Jul 2021 02:50:01 GMT) Full text and rfc822 format available.

Message #20 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 39728 <at> debbugs.gnu.org
Subject: Re: bug#39728: [PATCH] Allow parallel downloads and builds
Date: Tue, 13 Jul 2021 22:49:02 -0400
Hello!

Julien Lepiller <julien <at> lepiller.eu> writes:

> Hi guix!
>
> This patch allows to count builds and downloads separately. The idea is
> that downloads need bandwidth, but no CPU, while builds do not need
> bandwidth, but need CPU. With this patch, guix will be able to download
> substitutes while building unrelated packages. Currently, guix needs to
> wait for the download to finish before proceeding to the build. This
> should reduce the time of guix commands that need to build and download
> things at the same time.
>
> What do you think?

Looks like a neat improvement!  Could you provide a follow-up to
Ludovic's review?  It seems not much is missing (minor cosmetic changes
to commit messages + code and more importantly, the accompanying
documentation update).

Thank you!

Maxim




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Sun, 21 Nov 2021 22:50:02 GMT) Full text and rfc822 format available.

Message #23 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: Pierre Neidhardt <mail <at> ambrevar.xyz>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39728 <at> debbugs.gnu.org,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39728] [PATCH v2] Allow parallel downloads and builds
Date: Sun, 21 Nov 2021 23:49:41 +0100
[Message part 1 (text/plain, inline)]
Le Wed, 26 Feb 2020 11:36:32 +0100,
Pierre Neidhardt <mail <at> ambrevar.xyz> a écrit :

> Thank you so much for this, Julien!
> 
> There was a thread on this topic on the mailing list:
> https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00002.html
> (could not find the first email :p).
> 
> There were a couple other issues that were mentioned there.
> This patch would be a first step towards more parallelization!
> 

Hi!

After so long, I managed to find the time to go over the comments and
improve my patches. I tested the new daemon for a bit, and it's working
as expected so far :D
[0001-daemon-Count-build-and-download-jobs-separately.patch (text/x-patch, attachment)]
[0002-guix-Support-specifying-max-download-jobs.patch (text/x-patch, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Thu, 25 Nov 2021 12:54:01 GMT) Full text and rfc822 format available.

Message #26 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 39728 <at> debbugs.gnu.org, Pierre Neidhardt <mail <at> ambrevar.xyz>,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39728] [PATCH v2] Allow parallel downloads and builds
Date: Thu, 25 Nov 2021 13:53:25 +0100
Hello,

Julien Lepiller <julien <at> lepiller.eu> skribis:

> After so long, I managed to find the time to go over the comments and
> improve my patches. I tested the new daemon for a bit, and it's working
> as expected so far :D

On a recent daemon, have you seen cases where having multiple downloads
in parallel speeds things up?

The analysis in
<https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/>
suggests that at the time you first submitted this patch, substitution
speed (which is different from raw download speed) was often CPU-bound.
This is no longer the case, meaning that downloads should now be
network-bound or almost.

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Thu, 25 Nov 2021 13:03:02 GMT) Full text and rfc822 format available.

Message #29 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 39728 <at> debbugs.gnu.org, Pierre Neidhardt <mail <at> ambrevar.xyz>,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39728] [PATCH v2] Allow parallel downloads and builds
Date: Thu, 25 Nov 2021 14:01:45 +0100
Le Thu, 25 Nov 2021 13:53:25 +0100,
Ludovic Courtès <ludo <at> gnu.org> a écrit :

> Hello,
> 
> Julien Lepiller <julien <at> lepiller.eu> skribis:
> 
> > After so long, I managed to find the time to go over the comments
> > and improve my patches. I tested the new daemon for a bit, and it's
> > working as expected so far :D  
> 
> On a recent daemon, have you seen cases where having multiple
> downloads in parallel speeds things up?
> 
> The analysis in
> <https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/>
> suggests that at the time you first submitted this patch, substitution
> speed (which is different from raw download speed) was often
> CPU-bound. This is no longer the case, meaning that downloads should
> now be network-bound or almost.
> 
> Thanks,
> Ludo’.

I would still say yes, because the output from berlin is often much
less than my throughput. With multiple downloads in parallel it at
least feels quicker, probably because I can download at full speed.

In any case, I see often a build start while downloads are in progress,
so I think it's still a win if you can get a few derivations built
while waiting for a big download to finish at the same time :)

At some point we might want to prioritize builds/downloads that help
unlock as much builds as possible early, so we don't have builds
waiting for downloads.




Information forwarded to guix-patches <at> gnu.org:
bug#39728; Package guix-patches. (Fri, 26 Nov 2021 10:17:02 GMT) Full text and rfc822 format available.

Message #32 received at 39728 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 39728 <at> debbugs.gnu.org, Pierre Neidhardt <mail <at> ambrevar.xyz>,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39728] [PATCH v2] Allow parallel downloads and builds
Date: Fri, 26 Nov 2021 11:16:16 +0100
Hi!

Julien Lepiller <julien <at> lepiller.eu> skribis:

> I would still say yes, because the output from berlin is often much
> less than my throughput. With multiple downloads in parallel it at
> least feels quicker, probably because I can download at full speed.

It would be nice to measure that because like I wrote, I think we’re
pretty much network-bound these days, at least with zstd and
uncompressed downloads.

> In any case, I see often a build start while downloads are in progress,
> so I think it's still a win if you can get a few derivations built
> while waiting for a big download to finish at the same time :)

True!  Overlapping downloads and builds sounds like a good idea.

> At some point we might want to prioritize builds/downloads that help
> unlock as much builds as possible early, so we don't have builds
> waiting for downloads.

Right now the daemon starts with substitutes and builds afterwards.

BTW, we’re assuming downloads = substitutes in this whole discussion,
but we could/should take fixed-output derivations into account too.

I’ll take a closer look later on…

Thanks,
Ludo’.




This bug report was last modified 2 years and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.