GNU bug report logs - #32879
[PATCH] database: Add builds only if one of their outputs is new.

Previous Next

Package: guix-patches;

Reported by: Clément Lassieur <clement <at> lassieur.org>

Date: Sat, 29 Sep 2018 20:37:02 UTC

Severity: normal

Tags: patch

Done: Clément Lassieur <clement <at> lassieur.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32879 in the body.
You can then email your comments to 32879 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#32879; Package guix-patches. (Sat, 29 Sep 2018 20:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Clément Lassieur <clement <at> lassieur.org>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Sat, 29 Sep 2018 20:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: guix-patches <at> gnu.org
Subject: [PATCH] database: Add builds only if one of their outputs is new.
Date: Sat, 29 Sep 2018 22:35:32 +0200
* Makefile.am (dist_sql_DATA): Add 'src/sql/upgrade-4.sql'.
* src/cuirass/database.scm (db-add-output): New procedure.
(db-add-build): Call DB-ADD-OUTPUT, rollback the transaction and return #f if
DB-ADD-OUTPUT returned an empty list.
* src/schema.sql (Outputs): Set 'path' as primary key, instead of 'derivation,
name'.
* src/sql/upgrade-4.sql: New file with SQL queries to upgrade the database.
* tests/database.scm (make-dummy-build): Use the #:OUTPUTS key.  Get default
OUTPUTS to depend on DRV.
("db-add-build-with-fixed-output"): New test.
---
 Makefile.am              |  3 ++-
 src/cuirass/database.scm | 46 +++++++++++++++++++++++++++++-----------
 src/schema.sql           |  3 +--
 src/sql/upgrade-4.sql    | 18 ++++++++++++++++
 tests/database.scm       | 16 ++++++++++++--
 5 files changed, 69 insertions(+), 17 deletions(-)
 create mode 100644 src/sql/upgrade-4.sql

diff --git a/Makefile.am b/Makefile.am
index 2f83659..7cea2ff 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -67,7 +67,8 @@ dist_pkgdata_DATA = src/schema.sql
 dist_sql_DATA = 				\
   src/sql/upgrade-1.sql				\
   src/sql/upgrade-2.sql				\
-  src/sql/upgrade-3.sql
+  src/sql/upgrade-3.sql				\
+  src/sql/upgrade-4.sql
 
 dist_css_DATA =					\
   src/static/css/bootstrap.css			\
diff --git a/src/cuirass/database.scm b/src/cuirass/database.scm
index 6777d28..9664f1b 100644
--- a/src/cuirass/database.scm
+++ b/src/cuirass/database.scm
@@ -425,12 +425,33 @@ string."
   (failed-other      3)
   (canceled          4))
 
+(define (db-add-output derivation output)
+  "Insert OUTPUT associated with DERIVATION.  If an output with the same path
+already exists, return #f."
+  (with-db-critical-section db
+    (catch 'sqlite-error
+      (lambda ()
+        (match output
+          ((name . path)
+           (sqlite-exec db "\
+INSERT INTO Outputs (derivation, name, path) VALUES ("
+                        derivation ", " name ", " path ");")))
+        (last-insert-rowid db))
+      (lambda (key who code message . rest)
+        ;; If we get a unique-constraint-failed error, that means we have
+        ;; already inserted the same output.  That happens with fixed-output
+        ;; derivations.
+        (if (= code SQLITE_CONSTRAINT_PRIMARYKEY)
+            #f
+            (apply throw key who code rest))))))
+
 (define (db-add-build build)
-  "Store BUILD in database the database.  BUILD eventual outputs are stored in
-the OUTPUTS table."
+  "Store BUILD in database the database only if one of its outputs is new.
+Return #f otherwise.  BUILD outputs are stored in the OUTPUTS table."
   (with-db-critical-section db
     (catch 'sqlite-error
       (lambda ()
+        (sqlite-exec db "BEGIN TRANSACTION;")
         (sqlite-exec db "
 INSERT INTO Builds (derivation, evaluation, job_name, system, nix_name, log,
 status, timestamp, starttime, stoptime)
@@ -446,21 +467,22 @@ VALUES ("
                      (or (assq-ref build #:timestamp) 0) ", "
                      (or (assq-ref build #:starttime) 0) ", "
                      (or (assq-ref build #:stoptime) 0) ");")
-        (let ((derivation (assq-ref build #:derivation)))
-          (for-each (lambda (output)
-                      (match output
-                        ((name . path)
-                         (sqlite-exec db "\
-INSERT INTO Outputs (derivation, name, path) VALUES ("
-                                      derivation ", " name ", " path ");"))))
-                    (assq-ref build #:outputs))
-          derivation))
+        (let* ((derivation (assq-ref build #:derivation))
+               (outputs (assq-ref build #:outputs))
+               (new-outputs (filter-map (cut db-add-output derivation <>)
+                                        outputs)))
+          (if (null? new-outputs)
+              (begin (sqlite-exec db "ROLLBACK;")
+                     #f)
+              (begin (sqlite-exec db "COMMIT;")
+                     derivation))))
       (lambda (key who code message . rest)
         ;; If we get a unique-constraint-failed error, that means we have
         ;; already inserted the same build.  That happens when several jobs
         ;; produce the same derivation, and we can ignore it.
         (if (= code SQLITE_CONSTRAINT_PRIMARYKEY)
-            #f
+            (begin (sqlite-exec db "ROLLBACK;")
+                   #f)
             (apply throw key who code rest))))))
 
 (define* (db-update-build-status! drv status #:key log-file)
diff --git a/src/schema.sql b/src/schema.sql
index bfc9ca7..a9e4a6a 100644
--- a/src/schema.sql
+++ b/src/schema.sql
@@ -46,8 +46,7 @@ CREATE TABLE Evaluations (
 CREATE TABLE Outputs (
   derivation TEXT NOT NULL,
   name TEXT NOT NULL,
-  path TEXT NOT NULL,
-  PRIMARY KEY (derivation, name),
+  path TEXT NOT NULL PRIMARY KEY,
   FOREIGN KEY (derivation) REFERENCES Builds (derivation)
 );
 
diff --git a/src/sql/upgrade-4.sql b/src/sql/upgrade-4.sql
new file mode 100644
index 0000000..e567f03
--- /dev/null
+++ b/src/sql/upgrade-4.sql
@@ -0,0 +1,18 @@
+BEGIN TRANSACTION;
+
+ALTER TABLE Outputs RENAME TO tmp_Outputs;
+
+CREATE TABLE Outputs (
+  derivation TEXT NOT NULL,
+  name TEXT NOT NULL,
+  path TEXT NOT NULL PRIMARY KEY,
+  FOREIGN KEY (derivation) REFERENCES Builds (derivation)
+);
+
+INSERT OR IGNORE INTO Outputs (derivation, name, path)
+SELECT derivation, name, path
+FROM tmp_Outputs;
+
+DROP TABLE tmp_Outputs;
+
+COMMIT;
diff --git a/tests/database.scm b/tests/database.scm
index 21a12f4..d9dfe13 100644
--- a/tests/database.scm
+++ b/tests/database.scm
@@ -57,14 +57,15 @@
 
 (define* (make-dummy-build drv
                            #:optional (eval-id 42)
-                           #:key (outputs '(("foo" . "/foo"))))
+                           #:key (outputs
+                                  `(("foo" . ,(format #f "~a.output" drv)))))
   `((#:derivation . ,drv)
     (#:eval-id . ,eval-id)
     (#:job-name . "job")
     (#:system . "x86_64-linux")
     (#:nix-name . "foo")
     (#:log . "log")
-    (#:outputs . (("foo" . "/foo")))))
+    (#:outputs . ,outputs)))
 
 (define-syntax-rule (with-temporary-database body ...)
   (call-with-temporary-output-file
@@ -114,6 +115,17 @@ INSERT INTO Evaluations (specification, in_progress) VALUES (3, false);")
       ;; there, see <https://bugs.gnu.org/28094>.
       (db-add-build build)))
 
+  (test-equal "db-add-build-with-fixed-output"
+    #f
+    (let ((build1 (make-dummy-build "/fixed1.drv"
+                                    #:outputs '(("out" . "/fixed-output"))))
+          (build2 (make-dummy-build "/fixed2.drv"
+                                    #:outputs '(("out" . "/fixed-output")))))
+      (db-add-build build1)
+
+      ;; Should return #f because the outputs are the same.
+      (db-add-build build2)))
+
   (test-equal "db-update-build-status!"
     (list (build-status scheduled)
           (build-status started)
-- 
2.19.0





Information forwarded to guix-patches <at> gnu.org:
bug#32879; Package guix-patches. (Sun, 30 Sep 2018 19:35:02 GMT) Full text and rfc822 format available.

Message #8 received at 32879 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Clément Lassieur <clement <at> lassieur.org>
Cc: 32879 <at> debbugs.gnu.org
Subject: Re: [bug#32879] [PATCH] database: Add builds only if one of their
 outputs is new.
Date: Sun, 30 Sep 2018 21:34:12 +0200
Hello Clément,

Clément Lassieur <clement <at> lassieur.org> skribis:

> * Makefile.am (dist_sql_DATA): Add 'src/sql/upgrade-4.sql'.
> * src/cuirass/database.scm (db-add-output): New procedure.
> (db-add-build): Call DB-ADD-OUTPUT, rollback the transaction and return #f if
> DB-ADD-OUTPUT returned an empty list.
> * src/schema.sql (Outputs): Set 'path' as primary key, instead of 'derivation,
> name'.
> * src/sql/upgrade-4.sql: New file with SQL queries to upgrade the database.
> * tests/database.scm (make-dummy-build): Use the #:OUTPUTS key.  Get default
> OUTPUTS to depend on DRV.
> ("db-add-build-with-fixed-output"): New test.

What’s the rationale?  I suppose having a simpler primary key for
‘Outputs’ might help performance?

Thank you,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#32879; Package guix-patches. (Sun, 30 Sep 2018 20:42:02 GMT) Full text and rfc822 format available.

Message #11 received at 32879 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 32879 <at> debbugs.gnu.org
Subject: Re: [bug#32879] [PATCH] database: Add builds only if one of their
 outputs is new.
Date: Sun, 30 Sep 2018 22:41:38 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hello Clément,
>
> Clément Lassieur <clement <at> lassieur.org> skribis:
>
>> * Makefile.am (dist_sql_DATA): Add 'src/sql/upgrade-4.sql'.
>> * src/cuirass/database.scm (db-add-output): New procedure.
>> (db-add-build): Call DB-ADD-OUTPUT, rollback the transaction and return #f if
>> DB-ADD-OUTPUT returned an empty list.
>> * src/schema.sql (Outputs): Set 'path' as primary key, instead of 'derivation,
>> name'.
>> * src/sql/upgrade-4.sql: New file with SQL queries to upgrade the database.
>> * tests/database.scm (make-dummy-build): Use the #:OUTPUTS key.  Get default
>> OUTPUTS to depend on DRV.
>> ("db-add-build-with-fixed-output"): New test.
>
> What’s the rationale?  I suppose having a simpler primary key for
> ‘Outputs’ might help performance?

There is a slight performance and db size gain but the primary reason is
to have a better idea of Cuirass' load when looking at the pending
builds.  There will be less (no?) 'fake' builds.  The idea is that all
builds should be real builds.

Does that make sense?

> Thank you,
> Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#32879; Package guix-patches. (Mon, 01 Oct 2018 09:11:02 GMT) Full text and rfc822 format available.

Message #14 received at 32879 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 32879 <at> debbugs.gnu.org
Subject: Re: [bug#32879] [PATCH] database: Add builds only if one of their
 outputs is new.
Date: Mon, 01 Oct 2018 11:09:59 +0200
Clément Lassieur <clement <at> lassieur.org> writes:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Hello Clément,
>>
>> Clément Lassieur <clement <at> lassieur.org> skribis:
>>
>>> * Makefile.am (dist_sql_DATA): Add 'src/sql/upgrade-4.sql'.
>>> * src/cuirass/database.scm (db-add-output): New procedure.
>>> (db-add-build): Call DB-ADD-OUTPUT, rollback the transaction and return #f if
>>> DB-ADD-OUTPUT returned an empty list.
>>> * src/schema.sql (Outputs): Set 'path' as primary key, instead of 'derivation,
>>> name'.
>>> * src/sql/upgrade-4.sql: New file with SQL queries to upgrade the database.
>>> * tests/database.scm (make-dummy-build): Use the #:OUTPUTS key.  Get default
>>> OUTPUTS to depend on DRV.
>>> ("db-add-build-with-fixed-output"): New test.
>>
>> What’s the rationale?  I suppose having a simpler primary key for
>> ‘Outputs’ might help performance?
>
> There is a slight performance and db size gain but the primary reason is
> to have a better idea of Cuirass' load when looking at the pending
> builds.  There will be less (no?) 'fake' builds.  The idea is that all
> builds should be real builds.

Also, it would allow to be notified when someone pushes on master a
commit that triggers too many builds.




Information forwarded to guix-patches <at> gnu.org:
bug#32879; Package guix-patches. (Tue, 02 Oct 2018 09:09:02 GMT) Full text and rfc822 format available.

Message #17 received at 32879 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Clément Lassieur <clement <at> lassieur.org>
Cc: 32879 <at> debbugs.gnu.org
Subject: Re: [bug#32879] [PATCH] database: Add builds only if one of their
 outputs is new.
Date: Tue, 02 Oct 2018 11:08:14 +0200
Hello!

Clément Lassieur <clement <at> lassieur.org> skribis:

> Clément Lassieur <clement <at> lassieur.org> writes:
>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>
>>> Hello Clément,
>>>
>>> Clément Lassieur <clement <at> lassieur.org> skribis:
>>>
>>>> * Makefile.am (dist_sql_DATA): Add 'src/sql/upgrade-4.sql'.
>>>> * src/cuirass/database.scm (db-add-output): New procedure.
>>>> (db-add-build): Call DB-ADD-OUTPUT, rollback the transaction and return #f if
>>>> DB-ADD-OUTPUT returned an empty list.
>>>> * src/schema.sql (Outputs): Set 'path' as primary key, instead of 'derivation,
>>>> name'.
>>>> * src/sql/upgrade-4.sql: New file with SQL queries to upgrade the database.
>>>> * tests/database.scm (make-dummy-build): Use the #:OUTPUTS key.  Get default
>>>> OUTPUTS to depend on DRV.
>>>> ("db-add-build-with-fixed-output"): New test.
>>>
>>> What’s the rationale?  I suppose having a simpler primary key for
>>> ‘Outputs’ might help performance?
>>
>> There is a slight performance and db size gain but the primary reason is
>> to have a better idea of Cuirass' load when looking at the pending
>> builds.  There will be less (no?) 'fake' builds.  The idea is that all
>> builds should be real builds.
>
> Also, it would allow to be notified when someone pushes on master a
> commit that triggers too many builds.

That makes a lot of sense, thank you!

IMO you can go ahead and push it.

Ludo’.




Reply sent to Clément Lassieur <clement <at> lassieur.org>:
You have taken responsibility. (Tue, 02 Oct 2018 11:27:02 GMT) Full text and rfc822 format available.

Notification sent to Clément Lassieur <clement <at> lassieur.org>:
bug acknowledged by developer. (Tue, 02 Oct 2018 11:27:02 GMT) Full text and rfc822 format available.

Message #22 received at 32879-done <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 32879-done <at> debbugs.gnu.org
Subject: Re: [bug#32879] [PATCH] database: Add builds only if one of their
 outputs is new.
Date: Tue, 02 Oct 2018 13:26:17 +0200
Hi Ludo,

Ludovic Courtès <ludo <at> gnu.org> writes:

> That makes a lot of sense, thank you!
>
> IMO you can go ahead and push it.

Great!  Thank you for the review.

Clément




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 31 Oct 2018 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 187 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.