GNU bug report logs - #22533
Non-determinism in python-3 ".pyc" bytecode

Previous Next

Package: guix;

Reported by: Leo Famulari <leo <at> famulari.name>

Date: Tue, 2 Feb 2016 05:17:02 UTC

Severity: important

Done: Ricardo Wurmus <rekado <at> elephly.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22533 in the body.
You can then email your comments to 22533 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 02 Feb 2016 05:17:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Leo Famulari <leo <at> famulari.name>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 02 Feb 2016 05:17:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: bug-guix <at> gnu.org
Subject: Non-determinism in python-3 ".pyc" bytecode
Date: Tue, 2 Feb 2016 00:15:44 -0500
While preparing a package for borg [0], I found that the built output
was not reproducible. The problem is that the bytecode compiler [1] for
Python 3.4.3 (our current version) encodes the mtime of the
corresponding Python source file in the output. This is described in
PEP-3147 [2], and the responsible Python code is referenced below [3].

I tested a few of our existing python-3 packages: python-ccm,
python-pysam, and python-scripttest all exhibit the same problem.

We fixed this in python-2 with the patch
python-2.7-source-date-epoch.patch, but I don't know how to write this
patch for python-3.

Can somebody write this patch?

I asked about this on #debian-reproducible and they said that it wasn't
an issue for Debian since they don't ship bytecode, but instead generate
it at install time. Of course, that doesn't really apply to Guix.

I used diffoscope-34 to inspect the build outputs to find this, and you
can see the report here:
https://famulari.name/misc/7c55c9e97f668234ddea50299d986f14/borg-diffoscope-report.html

It's first demonstrated in the file
...-borg-0.30.0/lib/python3.4/site-packages/__pycache__/site.cpython-34.pyc.

The first 2 bytes are the "magic numbers" described in PEP-3147, which
specify the version of the bytecode format. The next 2 bytes are the
problematic timestamp, as described in the PEP-3147.

[0]
http://borgbackup.github.io/

[1]
https://docs.python.org/3/library/py_compile.html

[2]
https://www.python.org/dev/peps/pep-3147/

[3] Check out the Guix git commit 4efc8eb27502c, and from there:
$ tar xf $(./pre-inst-env guix build --source python-3)
$ sed -n 139,140p Python-3.4.3/Lib/py_compile.py
    bytecode = importlib._bootstrap._code_to_bytecode(
            code, source_stats['mtime'], source_stats['size'])




Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 02 Feb 2016 08:55:02 GMT) Full text and rfc822 format available.

Message #8 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Tue, 2 Feb 2016 03:54:39 -0500
On Tue, Feb 02, 2016 at 12:15:44AM -0500, Leo Famulari wrote:
> While preparing a package for borg [0], I found that the built output
> was not reproducible. The problem is that the bytecode compiler [1] for
> Python 3.4.3 (our current version) encodes the mtime of the
> corresponding Python source file in the output. This is described in
> PEP-3147 [2], and the responsible Python code is referenced below [3].
> 
> I tested a few of our existing python-3 packages: python-ccm,
> python-pysam, and python-scripttest all exhibit the same problem.
> 
> We fixed this in python-2 with the patch
> python-2.7-source-date-epoch.patch, but I don't know how to write this
> patch for python-3.

mark_weaver suggested setting the timestamps of the source files before
building. I think this is a better option if it doesn't break anything.
It would allow the bytecode "staleness" check to work as expected while
keeping the output consistent.

> 
> Can somebody write this patch?
> 
> I asked about this on #debian-reproducible and they said that it wasn't
> an issue for Debian since they don't ship bytecode, but instead generate
> it at install time. Of course, that doesn't really apply to Guix.
> 
> I used diffoscope-34 to inspect the build outputs to find this, and you
> can see the report here:
> https://famulari.name/misc/7c55c9e97f668234ddea50299d986f14/borg-diffoscope-report.html
> 
> It's first demonstrated in the file
> ...-borg-0.30.0/lib/python3.4/site-packages/__pycache__/site.cpython-34.pyc.
> 
> The first 2 bytes are the "magic numbers" described in PEP-3147, which
> specify the version of the bytecode format. The next 2 bytes are the
> problematic timestamp, as described in the PEP-3147.
> 
> [0]
> http://borgbackup.github.io/
> 
> [1]
> https://docs.python.org/3/library/py_compile.html
> 
> [2]
> https://www.python.org/dev/peps/pep-3147/
> 
> [3] Check out the Guix git commit 4efc8eb27502c, and from there:
> $ tar xf $(./pre-inst-env guix build --source python-3)
> $ sed -n 139,140p Python-3.4.3/Lib/py_compile.py
>     bytecode = importlib._bootstrap._code_to_bytecode(
>             code, source_stats['mtime'], source_stats['size'])
> 
> 
> 




Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 02 Feb 2016 20:42:01 GMT) Full text and rfc822 format available.

Message #11 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Leo Famulari <leo <at> famulari.name>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Tue, 02 Feb 2016 21:41:19 +0100
[Message part 1 (text/plain, inline)]
Leo Famulari <leo <at> famulari.name> skribis:

> We fixed this in python-2 with the patch
> python-2.7-source-date-epoch.patch, but I don't know how to write this
> patch for python-3.

I would imagine something like this (untested):

[Message part 2 (text/x-patch, inline)]
--- Python-3.4.3/Lib/importlib/_bootstrap.py	2016-02-02 21:38:48.655809055 +0100
+++ Python-3.4.3/Lib/importlib/_bootstrap.py.new	2016-02-02 21:38:43.659769251 +0100
@@ -667,7 +667,10 @@ def _code_to_bytecode(code, mtime=0, sou
     """Compile a code object into bytecode for writing out to a byte-compiled
     file."""
     data = bytearray(MAGIC_NUMBER)
-    data.extend(_w_long(mtime))
+    if 'SOURCE_DATE_EPOCH' in _os.environ:
+        data.extend(_w_long(string.atoi(_os.environ['SOURCE_DATE_EPOCH'])))
+    else:
+        data.extend(_w_long(mtime))
     data.extend(_w_long(source_size))
     data.extend(marshal.dumps(code))
     return data
[Message part 3 (text/plain, inline)]
Could you give it a try and refine as needed?  :-)

> I asked about this on #debian-reproducible and they said that it wasn't
> an issue for Debian since they don't ship bytecode, but instead generate
> it at install time. Of course, that doesn't really apply to Guix.

I’d recommend trying #reproducible-builds on OFTC, which is more
generic.  Also, in some cases, it’s useful to look at
<git://git.debian.org/git/reproducible/notes.git>, which contains notes
about non-reproducible packages (currently partly Debian-specific, but
we need to lobby to make it more generic.  ;-))

Thanks,
Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Thu, 04 Feb 2016 23:18:02 GMT) Full text and rfc822 format available.

Message #14 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Thu, 4 Feb 2016 18:17:08 -0500
[Message part 1 (text/plain, inline)]
On Tue, Feb 02, 2016 at 09:41:19PM +0100, Ludovic Courtès wrote:
> Could you give it a try and refine as needed?  :-)

I altered your example as shown in the attached patch. It causes some
tests related to timestamps to fail, so I disabled them in a very crude
way. The final patch should address those tests more carefully.

But, the patch doesn't seem to have the desired effect so I'm asking for
help!

Here is how I tested the patch:

I build python-3 with it, and then `export SOURCE_DATE_EPOCH=1` and
enter the resulting Python shell. I manually define the '_w_long'
function used by the patched function. Then: 

print (_w_long(locale.atoi(os.getenv('SOURCE_DATE_EPOCH'))))
b'\x01\x00\x00\x00'

But, when I leave the Python shell and issue `python3 -m compileall
helloworld.py`, the timestamps are present in the compiled bytecode. I
can watch the clock "tick" by doing this repeatedly:

$ touch helloworld.py && rm -r __pycache__ && \
python3 -m compileall helloworld.py &&  \
hexdump __pycache__/helloworld.cpython-34.pyc | head -n1

I'm not much of a Python programmer, so I'm stumped.
[0001-SOURCE_DATE_EPOCH.patch (text/x-diff, attachment)]

Severity set to 'important' from 'normal' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 25 Mar 2016 08:47:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 29 Mar 2016 23:13:02 GMT) Full text and rfc822 format available.

Message #19 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Cyril Roelandt <tipecaml <at> gmail.com>
To: bug-guix <at> gnu.org
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Wed, 30 Mar 2016 01:11:47 +0200
[Message part 1 (text/plain, inline)]
Here is a version of the patch that works with the upstream Python, but
that I cannot get to work with our Guix recipe.

Could you test it and tell me what you think? I intend to push this to
CPython.

Cyril.
[upstream.patch (text/x-diff, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 29 Mar 2016 23:14:01 GMT) Full text and rfc822 format available.

Message #22 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Cyril Roelandt <tipecaml <at> gmail.com>
To: Leo Famulari <leo <at> famulari.name>, Ludovic Courtès
 <ludo <at> gnu.org>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Wed, 30 Mar 2016 01:13:24 +0200
[Message part 1 (text/plain, inline)]
Here is a version of the patch that works with the upstream Python, but
that I cannot get to work with our Guix recipe.

Could you test it and tell me what you think? I intend to push this to
CPython.

Cyril.
[upstream.patch (text/x-diff, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Wed, 06 Apr 2016 08:31:02 GMT) Full text and rfc822 format available.

Message #25 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Cyril Roelandt <tipecaml <at> gmail.com>
Cc: 22533 <at> debbugs.gnu.org, Leo Famulari <leo <at> famulari.name>
Subject: Re: bug#22533: Non-determinism in python-3 ".pyc" bytecode
Date: Wed, 06 Apr 2016 10:29:57 +0200
[Message part 1 (text/plain, inline)]
Cyril Roelandt <tipecaml <at> gmail.com> skribis:

> Here is a version of the patch that works with the upstream Python, but
> that I cannot get to work with our Guix recipe.

At first sight the patch LGTM.  How does it not work for you? :-)

I applied this:

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/packages/patches/python-3-deterministic-build-info.patch b/gnu/packages/patches/python-3-deterministic-build-info.patch
index 22c372a..bdf9f20 100644
--- a/gnu/packages/patches/python-3-deterministic-build-info.patch
+++ b/gnu/packages/patches/python-3-deterministic-build-info.patch
@@ -15,3 +15,28 @@ We cannot pass it in CPPFLAGS due to whitespace in the DATE string.
  #ifndef DATE
  #ifdef __DATE__
  #define DATE __DATE__
+
+--- Lib/importlib/_bootstrap.py
++++ Lib/importlib/_bootstrap.py
+@@ -1443,7 +1443,8 @@ class SourceLoader(_LoaderBasics):
+         Implementing this method allows the loader to read bytecode files.
+         Raises IOError when the path cannot be handled.
+         """
+-        return {'mtime': self.path_mtime(path)}
++        return {'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH',
++                                               st.st_mtime))}
+ 
+     def _cache_bytecode(self, source_path, cache_path, data):
+         """Optional method which writes data (bytes) to a file path (a str).
+@@ -1580,7 +1581,10 @@ class SourceFileLoader(FileLoader, SourceLoader):
+     def path_stats(self, path):
+         """Return the metadata for the path."""
+         st = _path_stat(path)
+-        return {'mtime': st.st_mtime, 'size': st.st_size}
++        return {
++            'mtime':  float(_os.environ.get(b'SOURCE_DATE_EPOCH', st.st_mtime)),
++            'size': st.st_size
++        }
+ 
+     def _cache_bytecode(self, source_path, bytecode_path, data):
+         # Adapt between the two APIs
[Message part 3 (text/plain, inline)]
… and that leads to these test failures:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build python <at> 3 --rounds=2 -K

[...]

======================================================================
FAIL: test_bad_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 452, in test_bad_marshal
    self._test_bad_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 342, in _test_bad_marshal
    self.import_(file_path, '_temp')
AssertionError: EOFError not raised

======================================================================
FAIL: test_no_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 441, in test_no_marshal
    self._test_no_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 322, in _test_no_marshal
    self.import_(file_path, '_temp')
AssertionError: EOFError not raised

======================================================================
FAIL: test_non_code_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 445, in test_non_code_marshal
    self._test_non_code_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 331, in _test_non_code_marshal
    self.import_(file_path, '_temp')
AssertionError: ImportError not raised

======================================================================
FAIL: test_old_timestamp (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 471, in test_old_timestamp
    self.assertEqual(bytecode_file.read(4), source_timestamp)
AssertionError: b'\x01\x00\x00\x00' != b'\x7f\xc7\x04W'

======================================================================
FAIL: test_bad_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 452, in test_bad_marshal
    self._test_bad_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 342, in _test_bad_marshal
    self.import_(file_path, '_temp')
AssertionError: EOFError not raised

======================================================================
FAIL: test_no_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 441, in test_no_marshal
    self._test_no_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 322, in _test_no_marshal
    self.import_(file_path, '_temp')
AssertionError: EOFError not raised

======================================================================
FAIL: test_non_code_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 445, in test_non_code_marshal
    self._test_non_code_marshal()
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 331, in _test_non_code_marshal
    self.import_(file_path, '_temp')
AssertionError: ImportError not raised

======================================================================
FAIL: test_old_timestamp (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper
    to_return = fxn(*args, **kwargs)
  File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 471, in test_old_timestamp
    self.assertEqual(bytecode_file.read(4), source_timestamp)
AssertionError: b'\x01\x00\x00\x00' != b'\x7f\xc7\x04W'

----------------------------------------------------------------------
Ran 951 tests in 1.102s

FAILED (failures=8, skipped=19, expected failures=1)
Makefile:958: recipe for target 'test' failed
--8<---------------cut here---------------end--------------->8---

‘test_old_timestamp’ clearly needs to be adjusted to account for the
change.  The others have to do with the bytecode loader, so it’s
probably a similar story.  Could you look into it?

Perhaps you tested with SOURCE_DATE_EPOCH unset?

Thanks for working on this, it’s an important bug to fix!

Ludo’.

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Fri, 26 May 2017 13:42:02 GMT) Full text and rfc822 format available.

Message #28 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <mbakke <at> fastmail.com>
To: 22533 <at> debbugs.gnu.org
Subject: Python bytecode reproducibility
Date: Fri, 26 May 2017 15:41:39 +0200
[Message part 1 (text/plain, inline)]
Hello!

I stumbled across this bug after re-discovering that Python bytecode is
not reproducible (through "glib"). Just sharing some notes..

Nix recently made an effort to fix this. AFAICT the ".pyc" files are
still a problem, but at least they got the interpreters building
reproducibly:

https://github.com/NixOS/nixpkgs/issues/22570
https://github.com/NixOS/nixpkgs/pull/22585

It would be great to revive this longstanding bug!

*walks away slowly before anyone notices*
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Sat, 03 Mar 2018 22:38:02 GMT) Full text and rfc822 format available.

Message #31 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sat, 03 Mar 2018 23:37:29 +0100
Hi Guix,

Marius Bakke <mbakke <at> fastmail.com> writes:

> It would be great to revive this longstanding bug!

Indeed.

Here’s another attempt.  As far as I understand, the timestamp in the
pyc files only affects the header.

Up until Python 3.6 (incl) the header looks like this:

  magic | timestamp | size

Since Python 3.7 the header may either contain a timestamp or a hash:

  magic | 00000000000000000000000000000000 | timestamp | size
  magic | 00000000000000000000000000000001 | hash      | size

This means we likely won’t have this problem any more with Python 3.7.
For Python 3.6 I guess we could add a final build phase that overwrites
the timestamp in the *binary*.  This needs to happen before any of the
compiled files are wrapped up in a wheel.

Should we just wait for Python 3.7 which is expected to be released in
June 2018?  We’d still have to deal with this problem in Python 2,
though.

Is it a bad idea to override the timestamps in the generated binaries?
I think that we could avoid the recency check then, which was an
obstacle to resetting the timestamps of the source files.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Sun, 04 Mar 2018 09:22:01 GMT) Full text and rfc822 format available.

Message #34 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sun, 4 Mar 2018 10:21:17 +0100
[Message part 1 (text/plain, inline)]
2018-03-03 23:37 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:

> Hi Guix,
>
> Marius Bakke <mbakke <at> fastmail.com> writes:
>
> > It would be great to revive this longstanding bug!
>
> Indeed.
>
> Here’s another attempt.  As far as I understand, the timestamp in the
> pyc files only affects the header.
>
> Up until Python 3.6 (incl) the header looks like this:
>
>   magic | timestamp | size
>
> Since Python 3.7 the header may either contain a timestamp or a hash:
>
>   magic | 00000000000000000000000000000000 | timestamp | size
>   magic | 00000000000000000000000000000001 | hash      | size
>
> This means we likely won’t have this problem any more with Python 3.7.
> For Python 3.6 I guess we could add a final build phase that overwrites
> the timestamp in the *binary*.  This needs to happen before any of the
> compiled files are wrapped up in a wheel.
>
> Should we just wait for Python 3.7 which is expected to be released in
> June 2018?  We’d still have to deal with this problem in Python 2,
> though.
>
> Is it a bad idea to override the timestamps in the generated binaries?
> I think that we could avoid the recency check then, which was an
> obstacle to resetting the timestamps of the source files.

--
> Ricardo
>
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
>
>
Nix had this issue, it seems they have a python 3.5 solution, which
should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570.
WDYT?
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Sun, 04 Mar 2018 12:47:02 GMT) Full text and rfc822 format available.

Message #37 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sun, 04 Mar 2018 13:46:07 +0100
Hi Gábor,

> Nix had this issue, it seems they have a python 3.5 solution, which
> should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570.
> WDYT?

Here’s the patch for Nix:

  https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/22585.diff

Here are the relevant changes to the Python packages:

* Python 3.4

  substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
  substituteInPlace "Lib/importlib/_bootstrap.py" --replace "source_mtime = int(source_stats['mtime'])" "source_mtime = 1"

* Python 3.5

  substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
  substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "source_mtime = int(st['mtime'])" "source_mtime = 1"

* Python 3.6
  substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
  substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "source_mtime = int(st['mtime'])" "source_mtime = 1"


For all packages they set these environment variables:

  - set PYTHONHASHSEED=0 (for hashes of str, bytes and datetime objects)

  - set DETERMINISTIC_BUILD; for conditional patching of the timestamp
    for package builds.  The timestamp is not patched in ad-hoc
    environments, because that would mess with Python’s ability to
    determine whether to compile source files.

They also rebuild all bytecode (with the exception of lib2to3 because it
is Python 2 code) three times, once for each optimization level.

--8<---------------cut here---------------start------------->8---
+    # Determinism: rebuild all bytecode
+    # We exclude lib2to3 because that's Python 2 code which fails
+    # We rebuild three times, once for each optimization level
+    find $out -name "*.py" | $out/bin/python -m compileall -q -f -x "lib2to3" -i -
+    find $out -name "*.py" | $out/bin/python -O -m compileall -q -f -x "lib2to3" -i -
+    find $out -name "*.py" | $out/bin/python -OO -m compileall -q -f -x "lib2to3" -i -
--8<---------------cut here---------------end--------------->8---

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Sun, 04 Mar 2018 15:32:02 GMT) Full text and rfc822 format available.

Message #40 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sun, 4 Mar 2018 16:30:59 +0100
[Message part 1 (text/plain, inline)]
2018-03-04 13:46 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:

>
> Hi Gábor,
>
> > Nix had this issue, it seems they have a python 3.5 solution, which
> > should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570.
> > WDYT?
>
> Here’s the patch for Nix:
>
>   https://patch-diff.githubusercontent.com/raw/
> NixOS/nixpkgs/pull/22585.diff
>
> Here are the relevant changes to the Python packages:
>
> * Python 3.4
>
>   substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']"
> "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
>   substituteInPlace "Lib/importlib/_bootstrap.py" --replace "source_mtime
> = int(source_stats['mtime'])" "source_mtime = 1"
>
> * Python 3.5
>
>   substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']"
> "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
>   substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace
> "source_mtime = int(st['mtime'])" "source_mtime = 1"
>
> * Python 3.6
>   substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']"
> "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"
>   substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace
> "source_mtime = int(st['mtime'])" "source_mtime = 1"
>
>
>
Nice, thanks for the summary.
Can we adopt this as is?
Do we need the 3.4 and 3.5 fix or the 3.6 one is enough?


> For all packages they set these environment variables:
>
>   - set PYTHONHASHSEED=0 (for hashes of str, bytes and datetime objects)
>
>   - set DETERMINISTIC_BUILD; for conditional patching of the timestamp
>     for package builds.  The timestamp is not patched in ad-hoc
>     environments, because that would mess with Python’s ability to
>     determine whether to compile source files.
>
>
Should we set these in python-build-system? What about python booststrap?
I guess we use gnu-build-system there, so bootstrap packages might need to
set these explicitly?


> They also rebuild all bytecode (with the exception of lib2to3 because it
> is Python 2 code) three times, once for each optimization level.
>
> --8<---------------cut here---------------start------------->8---
> +    # Determinism: rebuild all bytecode
> +    # We exclude lib2to3 because that's Python 2 code which fails
> +    # We rebuild three times, once for each optimization level
> +    find $out -name "*.py" | $out/bin/python -m compileall -q -f -x
> "lib2to3" -i -
> +    find $out -name "*.py" | $out/bin/python -O -m compileall -q -f -x
> "lib2to3" -i -
> +    find $out -name "*.py" | $out/bin/python -OO -m compileall -q -f -x
> "lib2to3" -i -
> --8<---------------cut here---------------end--------------->8---
>
>
Do we also have to do this, or should we settle with one optimization
level? Which one?


> --
> Ricardo
>
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Sun, 04 Mar 2018 19:19:01 GMT) Full text and rfc822 format available.

Message #43 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sun, 04 Mar 2018 20:18:23 +0100
[Message part 1 (text/plain, inline)]
I have applied this patch locally:

[1.diff (text/x-patch, inline)]
diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
index 5f701701a..0d1ecc3c6 100644
--- a/gnu/packages/python.scm
+++ b/gnu/packages/python.scm
@@ -359,8 +359,42 @@ data types.")
                               "Lib/ctypes/test/test_win32.py" ; fails on aarch64
                               "Lib/test/test_fcntl.py")) ; fails on aarch64
                   #t))))
-    (arguments (substitute-keyword-arguments (package-arguments python-2)
-                 ((#:tests? _) #t)))
+    (arguments
+     (substitute-keyword-arguments (package-arguments python-2)
+       ((#:tests? _) #t)
+       ((#:phases phases)
+        `(modify-phases ,phases
+           (add-after 'unpack 'patch-timestamp-for-pyc-files
+             (lambda _
+               ;; We set DETERMINISTIC_BUILD to only override the mtime when
+               ;; building with Guix, lest we break auto-compilation in
+               ;; environments.
+               (setenv "DETERMINISTIC_BUILD" "1")
+               (substitute* "Lib/py_compile.py"
+                 (("source_stats\\['mtime'\\]")
+                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
+
+               ;; Use deterministic hashes for strings, bytes, and datetime
+               ;; objects.
+               (setenv "PYTHONHASHSEED" "0")
+
+               ;; Reset mtime when validating bytecode header.
+               (substitute* "Lib/importlib/_bootstrap_external.py"
+                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
+                  "source_mtime = 1"))
+               #t))
+           (add-after 'unpack 'disable-timestamp-tests
+             (lambda _
+               (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
+                 (("test_bad_marshal")
+                  "disable_test_bad_marshal")
+                 (("test_no_marshal")
+                  "disable_test_no_marshal")
+                 (("test_non_code_marshal")
+                  "disable_test_non_code_marshal"))
+               #t))
+           (add-before 'check 'allow-non-deterministic-compilation
+             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
     (native-search-paths
      (list (search-path-specification
             (variable "PYTHONPATH")
[Message part 3 (text/plain, inline)]
It allows me to build python-six and python-sip reproducibly.  It does
not fix problems with Python 2, and I haven’t yet tested if it causes
any new problems.

It’s a little worrying that I had to disable three more tests that I
think shouldn’t have failed.

What do you think?

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 00:03:01 GMT) Full text and rfc822 format available.

Message #46 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 01:02:15 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> I have applied this patch locally:
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
>                                "Lib/ctypes/test/test_win32.py" ; fails on aarch64
>                                "Lib/test/test_fcntl.py")) ; fails on aarch64
>                    #t))))
> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
> -                 ((#:tests? _) #t)))
> +    (arguments
> +     (substitute-keyword-arguments (package-arguments python-2)
> +       ((#:tests? _) #t)
> +       ((#:phases phases)
> +        `(modify-phases ,phases
> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
> +             (lambda _
> +               ;; We set DETERMINISTIC_BUILD to only override the mtime when
> +               ;; building with Guix, lest we break auto-compilation in
> +               ;; environments.
> +               (setenv "DETERMINISTIC_BUILD" "1")
> +               (substitute* "Lib/py_compile.py"
> +                 (("source_stats\\['mtime'\\]")
> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
> +
> +               ;; Use deterministic hashes for strings, bytes, and datetime
> +               ;; objects.
> +               (setenv "PYTHONHASHSEED" "0")
> +
> +               ;; Reset mtime when validating bytecode header.
> +               (substitute* "Lib/importlib/_bootstrap_external.py"
> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> +                  "source_mtime = 1"))
> +               #t))
> +           (add-after 'unpack 'disable-timestamp-tests
> +             (lambda _
> +               (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
> +                 (("test_bad_marshal")
> +                  "disable_test_bad_marshal")
> +                 (("test_no_marshal")
> +                  "disable_test_no_marshal")
> +                 (("test_non_code_marshal")
> +                  "disable_test_non_code_marshal"))
> +               #t))
> +           (add-before 'check 'allow-non-deterministic-compilation
> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>      (native-search-paths
>       (list (search-path-specification
>              (variable "PYTHONPATH")
>
> It allows me to build python-six and python-sip reproducibly.  It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.

I tested importing modules in an ad-hoc environment — no problems.

Unfortunately, this doesn’t fix all reproducibility problems with numpy:

--8<---------------cut here---------------start------------->8---
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ
Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ
--8<---------------cut here---------------end--------------->8---

But the successes with simpler Python packages are promising.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 00:06:01 GMT) Full text and rfc822 format available.

Message #49 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 01:05:04 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Unfortunately, this doesn’t fix all reproducibility problems with numpy:
>
> --8<---------------cut here---------------start------------->8---
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ
> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ
> --8<---------------cut here---------------end--------------->8---

Here’s what diffoscope says:

--8<---------------cut here---------------start------------->8---
diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
--- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
+++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
@@ -1,8 +1,8 @@
-00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000  3......Z&.......
+00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000  3......Z&.......
 00000010: 0000 0000 0000 0000 0001 0000 0040 0000  .............@..
 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400  .s ...d.Z.d.Z.d.
 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502  Z.d.Z.d.Z.e.s.e.
 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30  Z.d.S.).z.1.14.0
 00000050: da28 3639 3134 6262 3431 6630 6662 3363  .(6914bb41f0fb3c
 00000060: 3162 6135 3030 6261 6534 6537 6436 3731  1ba500bae4e7d671
 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d  da9536786fTN)...
--8<---------------cut here---------------end--------------->8---

In other words: this is the timestamp field of the pyc file.

Maybe this can be avoided by setting DETERMINISTIC_BUILD in the
python-build-system?

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 09:26:02 GMT) Full text and rfc822 format available.

Message #52 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 10:25:40 +0100
Hello!

Ricardo Wurmus <rekado <at> elephly.net> skribis:

> Is it a bad idea to override the timestamps in the generated binaries?
> I think that we could avoid the recency check then, which was an
> obstacle to resetting the timestamps of the source files.

I think it’s good if we can fix Python itself to honor SOURCE_DATE_EPOCH
for its timestamps, but it’s also OK to patch timestamps in generated
binaries.

We do that already in gzip headers, with ‘reset-gzip-timestamp’.

Thanks for tackling this!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 15:37:02 GMT) Full text and rfc822 format available.

Message #55 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 5 Mar 2018 16:36:31 +0100
[Message part 1 (text/plain, inline)]
2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:

>
> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
> > Unfortunately, this doesn’t fix all reproducibility problems with numpy:
> >
> > --8<---------------cut here---------------start------------->8---
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc
> differ
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc
> differ
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc
> differ
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/__pycache__/version.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ
> > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/
> numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/
> kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc
> differ
> > --8<---------------cut here---------------end--------------->8---
>
> Here’s what diffoscope says:
>
> --8<---------------cut here---------------start------------->8---
> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw
> cc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/
> numpy/__pycache__/version.cpython-36.pyc
> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/
> lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/
> python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
> @@ -1,8 +1,8 @@
> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000  3......Z&.......
> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000  3......Z&.......
>  00000010: 0000 0000 0000 0000 0001 0000 0040 0000  .............@..
>  00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400  .s ...d.Z.d.Z.d.
>  00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502  Z.d.Z.d.Z.e.s.e.
>  00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30  Z.d.S.).z.1.14.0
>  00000050: da28 3639 3134 6262 3431 6630 6662 3363  .(6914bb41f0fb3c
>  00000060: 3162 6135 3030 6261 6534 6537 6436 3731  1ba500bae4e7d671
>  00000070: 6461 3935 3336 3738 3666 544e 2905 da0d  da9536786fTN)...
> --8<---------------cut here---------------end--------------->8---
>
> In other words: this is the timestamp field of the pyc file.
>
> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the
> python-build-system?
>
>
It seems that the deterministic build patch already landed upstream
https://github.com/python/cpython/pull/5200, so we might consider
applying the upstream patches. WDYT?


> --
> Ricardo
>
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 20:34:02 GMT) Full text and rfc822 format available.

Message #58 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 5 Mar 2018 21:33:02 +0100
[Message part 1 (text/plain, inline)]
2018-03-05 16:36 GMT+01:00 Gábor Boskovits <boskovits <at> gmail.com>:

> 2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:
>
>>
>> Ricardo Wurmus <rekado <at> elephly.net> writes:
>>
>> > Unfortunately, this doesn’t fix all reproducibility problems with numpy:
>> >
>> > --8<---------------cut here---------------start------------->8---
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>> pycache__/__config__.cpython-36.pyc differ
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>> pycache__/exec_command.cpython-36.pyc differ
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>> pycache__/system_info.cpython-36.pyc differ
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/__pycache__/__config__.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc
>> differ
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/__pycache__/version.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
>> differ
>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>> packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and
>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/testing/nose_
>> tools/__pycache__/utils.cpython-36.pyc differ
>> > --8<---------------cut here---------------end--------------->8---
>>
>> Here’s what diffoscope says:
>>
>> --8<---------------cut here---------------start------------->8---
>> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache_
>> _/version.cpython-36.pyc
>> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0-check/lib/python3.6/site-packages/numpy/__pycache__/
>> version.cpython-36.pyc
>> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
>> @@ -1,8 +1,8 @@
>> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000  3......Z&.......
>> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000  3......Z&.......
>>  00000010: 0000 0000 0000 0000 0001 0000 0040 0000  .............@..
>>  00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400  .s ...d.Z.d.Z.d.
>>  00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502  Z.d.Z.d.Z.e.s.e.
>>  00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30  Z.d.S.).z.1.14.0
>>  00000050: da28 3639 3134 6262 3431 6630 6662 3363  .(6914bb41f0fb3c
>>  00000060: 3162 6135 3030 6261 6534 6537 6436 3731  1ba500bae4e7d671
>>  00000070: 6461 3935 3336 3738 3666 544e 2905 da0d  da9536786fTN)...
>> --8<---------------cut here---------------end--------------->8---
>>
>> In other words: this is the timestamp field of the pyc file.
>>
>> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the
>> python-build-system?
>>
>>
> It seems that the deterministic build patch already landed upstream
> https://github.com/python/cpython/pull/5200, so we might consider
> applying the upstream patches. WDYT?
>

And also this: https://github.com/python/cpython/pull/4575.
I'm now having a look at this approach. However this second one
seems quite invasive...


>
>
>> --
>> Ricardo
>>
>> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
>> https://elephly.net
>>
>>
>>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 21:48:02 GMT) Full text and rfc822 format available.

Message #61 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 22:46:38 +0100
Gábor Boskovits <boskovits <at> gmail.com> writes:

> 2018-03-05 16:36 GMT+01:00 Gábor Boskovits <boskovits <at> gmail.com>:
>
>> 2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:
>>
>>>
>>> Ricardo Wurmus <rekado <at> elephly.net> writes:
>>>
>>> > Unfortunately, this doesn’t fix all reproducibility problems with numpy:
>>> >
>>> > --8<---------------cut here---------------start------------->8---
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>>> pycache__/__config__.cpython-36.pyc differ
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>>> pycache__/exec_command.cpython-36.pyc differ
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/distutils/__
>>> pycache__/system_info.cpython-36.pyc differ
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/__pycache__/__config__.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc
>>> differ
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/__pycache__/version.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
>>> differ
>>> > Binary files /gnu/store/kd06ql8fynlydymzhhn
>>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-
>>> packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and
>>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/testing/nose_
>>> tools/__pycache__/utils.cpython-36.pyc differ
>>> > --8<---------------cut here---------------end--------------->8---
>>>
>>> Here’s what diffoscope says:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache_
>>> _/version.cpython-36.pyc
>>> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0-check/lib/python3.6/site-packages/numpy/__pycache__/
>>> version.cpython-36.pyc
>>> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.
>>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
>>> @@ -1,8 +1,8 @@
>>> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000  3......Z&.......
>>> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000  3......Z&.......
>>>  00000010: 0000 0000 0000 0000 0001 0000 0040 0000  .............@..
>>>  00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400  .s ...d.Z.d.Z.d.
>>>  00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502  Z.d.Z.d.Z.e.s.e.
>>>  00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30  Z.d.S.).z.1.14.0
>>>  00000050: da28 3639 3134 6262 3431 6630 6662 3363  .(6914bb41f0fb3c
>>>  00000060: 3162 6135 3030 6261 6534 6537 6436 3731  1ba500bae4e7d671
>>>  00000070: 6461 3935 3336 3738 3666 544e 2905 da0d  da9536786fTN)...
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> In other words: this is the timestamp field of the pyc file.
>>>
>>> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the
>>> python-build-system?
>>>
>>>
>> It seems that the deterministic build patch already landed upstream
>> https://github.com/python/cpython/pull/5200, so we might consider
>> applying the upstream patches. WDYT?
>>
>
> And also this: https://github.com/python/cpython/pull/4575.
> I'm now having a look at this approach. However this second one
> seems quite invasive...

These patches are for what will become Python 3.7.  Python 3.6 does not
have support for “invalidation_mode”, so at least the first patch would
not work for us.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 22:03:01 GMT) Full text and rfc822 format available.

Message #64 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 23:02:29 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
>> Unfortunately, this doesn’t fix all reproducibility problems with numpy:
>>
>> --8<---------------cut here---------------start------------->8---
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ
>> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ
>> --8<---------------cut here---------------end--------------->8---
>
> Here’s what diffoscope says:
>
> --8<---------------cut here---------------start------------->8---
> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc
> @@ -1,8 +1,8 @@
> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000  3......Z&.......
> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000  3......Z&.......
>  00000010: 0000 0000 0000 0000 0001 0000 0040 0000  .............@..
>  00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400  .s ...d.Z.d.Z.d.
>  00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502  Z.d.Z.d.Z.e.s.e.
>  00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30  Z.d.S.).z.1.14.0
>  00000050: da28 3639 3134 6262 3431 6630 6662 3363  .(6914bb41f0fb3c
>  00000060: 3162 6135 3030 6261 6534 6537 6436 3731  1ba500bae4e7d671
>  00000070: 6461 3935 3336 3738 3666 544e 2905 da0d  da9536786fTN)...
> --8<---------------cut here---------------end--------------->8---
>
> In other words: this is the timestamp field of the pyc file.
>
> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the
> python-build-system?

It cannot.

So, something’s still missing from my patch.  Does anyone see what might
be missing?

-- 
Ricardo






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 22:08:01 GMT) Full text and rfc822 format available.

Message #67 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 05 Mar 2018 23:06:51 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
>> I have applied this patch locally:
>>
>> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
>> index 5f701701a..0d1ecc3c6 100644
>> --- a/gnu/packages/python.scm
>> +++ b/gnu/packages/python.scm
>> @@ -359,8 +359,42 @@ data types.")
>>                                "Lib/ctypes/test/test_win32.py" ; fails on aarch64
>>                                "Lib/test/test_fcntl.py")) ; fails on aarch64
>>                    #t))))
>> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
>> -                 ((#:tests? _) #t)))
>> +    (arguments
>> +     (substitute-keyword-arguments (package-arguments python-2)
>> +       ((#:tests? _) #t)
>> +       ((#:phases phases)
>> +        `(modify-phases ,phases
>> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
>> +             (lambda _
>> +               ;; We set DETERMINISTIC_BUILD to only override the mtime when
>> +               ;; building with Guix, lest we break auto-compilation in
>> +               ;; environments.
>> +               (setenv "DETERMINISTIC_BUILD" "1")
>> +               (substitute* "Lib/py_compile.py"
>> +                 (("source_stats\\['mtime'\\]")
>> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
>> +
>> +               ;; Use deterministic hashes for strings, bytes, and datetime
>> +               ;; objects.
>> +               (setenv "PYTHONHASHSEED" "0")
>> +
>> +               ;; Reset mtime when validating bytecode header.
>> +               (substitute* "Lib/importlib/_bootstrap_external.py"
>> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
>> +                  "source_mtime = 1"))
>> +               #t))
>> +           (add-after 'unpack 'disable-timestamp-tests
>> +             (lambda _
>> +               (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
>> +                 (("test_bad_marshal")
>> +                  "disable_test_bad_marshal")
>> +                 (("test_no_marshal")
>> +                  "disable_test_no_marshal")
>> +                 (("test_non_code_marshal")
>> +                  "disable_test_non_code_marshal"))
>> +               #t))
>> +           (add-before 'check 'allow-non-deterministic-compilation
>> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>>      (native-search-paths
>>       (list (search-path-specification
>>              (variable "PYTHONPATH")
>>
>> It allows me to build python-six and python-sip reproducibly.  It does
>> not fix problems with Python 2, and I haven’t yet tested if it causes
>> any new problems.

I should also note that Python 3 itself still contains pyc files with
timestamps.  This could be the reason why in Nix all pyc files are
rebuilt (more than once).

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 05 Mar 2018 23:22:01 GMT) Full text and rfc822 format available.

Message #70 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <mbakke <at> fastmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>, Gábor Boskovits
 <boskovits <at> gmail.com>
Cc: 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 00:21:21 +0100
[Message part 1 (text/plain, inline)]
Ricardo Wurmus <rekado <at> elephly.net> writes:

> I have applied this patch locally:
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
>                                "Lib/ctypes/test/test_win32.py" ; fails on aarch64
>                                "Lib/test/test_fcntl.py")) ; fails on aarch64
>                    #t))))
> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
> -                 ((#:tests? _) #t)))
> +    (arguments
> +     (substitute-keyword-arguments (package-arguments python-2)
> +       ((#:tests? _) #t)
> +       ((#:phases phases)
> +        `(modify-phases ,phases
> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
> +             (lambda _
> +               ;; We set DETERMINISTIC_BUILD to only override the mtime when
> +               ;; building with Guix, lest we break auto-compilation in
> +               ;; environments.
> +               (setenv "DETERMINISTIC_BUILD" "1")
> +               (substitute* "Lib/py_compile.py"
> +                 (("source_stats\\['mtime'\\]")
> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
> +
> +               ;; Use deterministic hashes for strings, bytes, and datetime
> +               ;; objects.
> +               (setenv "PYTHONHASHSEED" "0")
> +
> +               ;; Reset mtime when validating bytecode header.
> +               (substitute* "Lib/importlib/_bootstrap_external.py"
> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> +                  "source_mtime = 1"))
> +               #t))
> +           (add-after 'unpack 'disable-timestamp-tests
> +             (lambda _
> +               (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
> +                 (("test_bad_marshal")
> +                  "disable_test_bad_marshal")
> +                 (("test_no_marshal")
> +                  "disable_test_no_marshal")
> +                 (("test_non_code_marshal")
> +                  "disable_test_non_code_marshal"))
> +               #t))
> +           (add-before 'check 'allow-non-deterministic-compilation
> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>      (native-search-paths
>       (list (search-path-specification
>              (variable "PYTHONPATH")
>
> It allows me to build python-six and python-sip reproducibly.  It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.
>
> It’s a little worrying that I had to disable three more tests that I
> think shouldn’t have failed.

Woow, nice work!  I can't tell what's going on with the tests, they do
some bytecode manipulation stuff.  Maybe it does not expect the low
timestamp somehow?

https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19dfe4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484

I guess we'll do at least one 'core-updates' before 3.7 is released, so
it makes sense to include this.  It should also give us some experience
that might be relevant for 2.7, since it probably won't get the upstream
reproducibility patch that relies on 3.7 features.

The only remark I have is: is introducing a new variable necessary?
SOURCE_DATE_EPOCH implies that the user wants a deterministic build;
the upstream patch doesn't actually honor it outside of making the
hashing method deterministic.  So, I think it might be enough to just
test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD.  The former
is also already set in the build environment.

However, I just noticed that you unset DETERMINISTIC_BUILD before the
'check' phase.  Did it break more things?

I suppose we'll have to set PYTHONHASHSEED somewhere in
python-build-system as well.  Did you check if that makes a difference
for numpy?  Perhaps it's enough to set it if we add an auto-compilation
step?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 06 Mar 2018 13:30:02 GMT) Full text and rfc822 format available.

Message #73 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: Gábor Boskovits <boskovits <at> gmail.com>,
 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 14:28:49 +0100
Marius Bakke <mbakke <at> fastmail.com> writes:

> The only remark I have is: is introducing a new variable necessary?
> SOURCE_DATE_EPOCH implies that the user wants a deterministic build;
> the upstream patch doesn't actually honor it outside of making the
> hashing method deterministic.  So, I think it might be enough to just
> test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD.  The former
> is also already set in the build environment.

> However, I just noticed that you unset DETERMINISTIC_BUILD before the
> 'check' phase.  Did it break more things?

Yes, it broke a bunch of tests that are all about recompiling files when
they are considered stale.

> I suppose we'll have to set PYTHONHASHSEED somewhere in
> python-build-system as well.  Did you check if that makes a difference
> for numpy?  Perhaps it's enough to set it if we add an auto-compilation
> step?

Right, I’m going to test this with numpy now.  Thanks for the hint!

-- 
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 06 Mar 2018 14:44:02 GMT) Full text and rfc822 format available.

Message #76 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: Gábor Boskovits <boskovits <at> gmail.com>,
 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 15:43:11 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Marius Bakke <mbakke <at> fastmail.com> writes:
>
>> I suppose we'll have to set PYTHONHASHSEED somewhere in
>> python-build-system as well.  Did you check if that makes a difference
>> for numpy?  Perhaps it's enough to set it if we add an auto-compilation
>> step?
>
> Right, I’m going to test this with numpy now.  Thanks for the hint!

It did help with one file, which is now built reproducibly, namely

  lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc

This leaves five files in numpy that shouldn’t be but unfortunately are
different.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net






Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Tue, 06 Mar 2018 14:58:01 GMT) Full text and rfc822 format available.

Message #79 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Tue, 6 Mar 2018 15:57:02 +0100
[Message part 1 (text/plain, inline)]
2018-03-06 15:43 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:

>
> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
> > Marius Bakke <mbakke <at> fastmail.com> writes:
> >
> >> I suppose we'll have to set PYTHONHASHSEED somewhere in
> >> python-build-system as well.  Did you check if that makes a difference
> >> for numpy?  Perhaps it's enough to set it if we add an auto-compilation
> >> step?
> >
> > Right, I’m going to test this with numpy now.  Thanks for the hint!
>
> It did help with one file, which is now built reproducibly, namely
>
>   lib/python3.6/site-packages/numpy/testing/nose_tools/__
> pycache__/utils.cpython-36.pyc
>
> This leaves five files in numpy that shouldn’t be but unfortunately are
> different.
>
>
Unfortunately backporting the upstream version is not straightforward at
all.
There are too many changes. I will have a look at those test failures
instead.


> --
> Ricardo
>
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Thu, 08 Mar 2018 10:40:02 GMT) Full text and rfc822 format available.

Message #82 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Thu, 8 Mar 2018 11:39:52 +0100
[Message part 1 (text/plain, inline)]
2018-03-04 20:18 GMT+01:00 Ricardo Wurmus <rekado <at> elephly.net>:

> I have applied this patch locally:
>
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
>                                "Lib/ctypes/test/test_win32.py" ; fails on
> aarch64
>                                "Lib/test/test_fcntl.py")) ; fails on
> aarch64
>                    #t))))
> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
> -                 ((#:tests? _) #t)))
> +    (arguments
> +     (substitute-keyword-arguments (package-arguments python-2)
> +       ((#:tests? _) #t)
> +       ((#:phases phases)
> +        `(modify-phases ,phases
> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
> +             (lambda _
> +               ;; We set DETERMINISTIC_BUILD to only override the mtime
> when
> +               ;; building with Guix, lest we break auto-compilation in
> +               ;; environments.
> +               (setenv "DETERMINISTIC_BUILD" "1")
> +               (substitute* "Lib/py_compile.py"
> +                 (("source_stats\\['mtime'\\]")
> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else
> source_stats['mtime'])"))
> +
> +               ;; Use deterministic hashes for strings, bytes, and
> datetime
> +               ;; objects.
> +               (setenv "PYTHONHASHSEED" "0")
> +
> +               ;; Reset mtime when validating bytecode header.
> +               (substitute* "Lib/importlib/_bootstrap_external.py"
> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> +                  "source_mtime = 1"))
> +               #t))
> +           (add-after 'unpack 'disable-timestamp-tests
> +             (lambda _
> +               (substitute* "Lib/test/test_importlib/
> source/test_file_loader.py"
> +                 (("test_bad_marshal")
> +                  "disable_test_bad_marshal")
> +                 (("test_no_marshal")
> +                  "disable_test_no_marshal")
> +                 (("test_non_code_marshal")
> +                  "disable_test_non_code_marshal"))
> +               #t))
> +           (add-before 'check 'allow-non-deterministic-compilation
> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>      (native-search-paths
>       (list (search-path-specification
>              (variable "PYTHONPATH")
>
>
> It allows me to build python-six and python-sip reproducibly.  It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.
>
> It’s a little worrying that I had to disable three more tests that I
> think shouldn’t have failed.
>
>
Ok, I've checked the test issue again. If we change the
_bootstrap_external.py
substitution to:
"source_mtime = 1 if 'DETERMINISTIC_BUILD' in _os.environ else
int(source_stats['mtime'])"
the test do not fail any more. WDYT?



> What do you think?
>
> --
> Ricardo
>
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 14 Jan 2019 13:41:01 GMT) Full text and rfc822 format available.

Message #85 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533 <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 14 Jan 2019 14:40:18 +0100
Now that we’re using Python 3.7 and this version supports hash-based pyc
files, is this still an issue?  Do we need to do anything to enable
hash-based pyc compilation?

See:
  https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files
  https://www.python.org/dev/peps/pep-0552/

-- 
Ricardo





Reply sent to Ricardo Wurmus <rekado <at> elephly.net>:
You have taken responsibility. (Sun, 03 Feb 2019 21:23:01 GMT) Full text and rfc822 format available.

Notification sent to Leo Famulari <leo <at> famulari.name>:
bug acknowledged by developer. (Sun, 03 Feb 2019 21:23:02 GMT) Full text and rfc822 format available.

Message #90 received at 22533-done <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Gábor Boskovits <boskovits <at> gmail.com>
Cc: Marius Bakke <mbakke <at> fastmail.com>, 22533-done <at> debbugs.gnu.org
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Sun, 03 Feb 2019 22:22:23 +0100
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Now that we’re using Python 3.7 and this version supports hash-based pyc
> files, is this still an issue?  Do we need to do anything to enable
> hash-based pyc compilation?
>
> See:
>   https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files
>   https://www.python.org/dev/peps/pep-0552/

It looks like this is no longer a problem.  I built borg just now and
the pyc files are reproducible.

(The man pages include a date stamp, though, which I’m trying to patch
now.)

--
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#22533; Package guix. (Mon, 04 Feb 2019 22:40:02 GMT) Full text and rfc822 format available.

Message #93 received at 22533 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 22533 <at> debbugs.gnu.org
Cc: rekado <at> elephly.net, leo <at> famulari.name
Subject: Re: bug#22533: Python bytecode reproducibility
Date: Mon, 04 Feb 2019 23:39:21 +0100
Ricardo Wurmus <rekado <at> elephly.net> skribis:

> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
>> Now that we’re using Python 3.7 and this version supports hash-based pyc
>> files, is this still an issue?  Do we need to do anything to enable
>> hash-based pyc compilation?
>>
>> See:
>>   https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files
>>   https://www.python.org/dev/peps/pep-0552/
>
> It looks like this is no longer a problem.  I built borg just now and
> the pyc files are reproducible.

Yay! \o/

Ludo'.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 05 Mar 2019 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 46 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.