GNU bug report logs - #35996
User account password got locked when booting old generation

Previous Next

Package: guix;

Reported by: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>

Date: Wed, 29 May 2019 20:46:02 UTC

Severity: important

Merged with 35902

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35996 in the body.
You can then email your comments to 35996 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Wed, 29 May 2019 20:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Wed, 29 May 2019 20:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: bug-guix <at> gnu.org
Subject: User account password got locked when booting old generation
Date: Wed, 29 May 2019 22:45:17 +0200
After I reconfigured to a broken, unbootable generation and then
rebooted to an old working generation, I found my user account
password was locked.  This was my /etc/shadow:

root::18045::::::
florian:!:18045::::::
nobody:!:18045::::::
guixbuilder01:!:18045::::::
guixbuilder02:!:18045::::::
guixbuilder03:!:18045::::::
guixbuilder04:!:18045::::::
guixbuilder05:!:18045::::::
guixbuilder06:!:18045::::::
guixbuilder07:!:18045::::::
guixbuilder08:!:18045::::::
guixbuilder09:!:18045::::::
guixbuilder10:!:18045::::::
ntpd:!:18045::::::
messagebus:!:18045::::::
polkitd:!:18045::::::
geoclue:!:18045::::::
colord:!:18045::::::
avahi:!:18045::::::
gdm:!:18045::::::
httpd:!:18045::::::

Logging in as root (root had an empty password before as well) and
running `passwd florian` fixed it, and I *cannot* reproduce the bug
anymore by booting the broken generation again, i.e. my password
remains set now.

I presume it is not possible to lock a password by typing the wrong
password too often on the virtual console?

If you think this is not enough material to work on, feel free to
close this bug, but there seems to be some misbehavior somewhere in
Guix’ password management.

(The reason for the new generation’s brokenness seems unrelated; it
could not boot after I tried adding syslogd to the requirements of
udev-shepherd-service.)

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Fri, 31 May 2019 22:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sat, 01 Jun 2019 00:05:44 +0200
Hello,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> After I reconfigured to a broken, unbootable generation and then
> rebooted to an old working generation, I found my user account
> password was locked.  This was my /etc/shadow:
>
> root::18045::::::
> florian:!:18045::::::
> nobody:!:18045::::::
> guixbuilder01:!:18045::::::
> guixbuilder02:!:18045::::::
> guixbuilder03:!:18045::::::
> guixbuilder04:!:18045::::::
> guixbuilder05:!:18045::::::
> guixbuilder06:!:18045::::::
> guixbuilder07:!:18045::::::
> guixbuilder08:!:18045::::::
> guixbuilder09:!:18045::::::
> guixbuilder10:!:18045::::::
> ntpd:!:18045::::::
> messagebus:!:18045::::::
> polkitd:!:18045::::::
> geoclue:!:18045::::::
> colord:!:18045::::::
> avahi:!:18045::::::
> gdm:!:18045::::::
> httpd:!:18045::::::
>
> Logging in as root (root had an empty password before as well) and
> running `passwd florian` fixed it, and I *cannot* reproduce the bug
> anymore by booting the broken generation again, i.e. my password
> remains set now.

Did the old generation you booted have user ‘florian’?

Also, how old was it?  User account management changed in March (commit
0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sat, 01 Jun 2019 05:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sat, 1 Jun 2019 07:52:38 +0200
On Sat, Jun 01, 2019 at 12:05:44AM +0200, Ludovic Courtès wrote:
> Did the old generation you booted have user ‘florian’?
> 

Yes, this is my main machine.  The users are the same.

> Also, how old was it?  User account management changed in March (commit
> 0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).
>

The git commit for both old and new generation is

commit ad7466aafd7f166d0b6be5eb32dda1d3ee8a6445 (origin/master, origin/HEAD)
Author: Ludovic Court<C3><A8>s <ludo <at> gnu.org>
Date:   Sun May 26 23:18:21 2019 +0200

with unrelevant patches (for USB_ModeSwitch) applied on both, as well
as a non-working patch adding syslogd as a requirement to
udev-shepherd-service (making the generation unbootable and making
something display Stack overflow) and a working patch to add a --debug
argument to udevd for the new, unbootable generation.  The old
generation has a patch that adds wrong arguments to udevd that udevd
ignores.  All seems harmless.

I still cannot reproduce despite rebooting the broken generation
multiple times and then booting back into the old generation.

I wonder what would change /etc/shadow.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sat, 01 Jun 2019 14:59:02 GMT) Full text and rfc822 format available.

Message #14 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sat, 1 Jun 2019 16:58:34 +0200
On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:
> I wonder what would change /etc/shadow.
> 

If the error occurred on common non-Guix distros, it hopefully would
have been fixed before, maybe.  Of course Guix recreates /etc/shadow
much more frequently.

Guix appears to add shadow files atomically in gnu/build/accounts.scm.
I do not know if there could have been an error reading the old shadow
file, e.g. because it is locked or something?

The elogind source code in src/basic/user-util.c contains code for
locking /etc/shadow, with a comment that explains why its lckpwdf is
implemented differently from shadow-utils.

AccountsService appears to only be usable for reading /etc/shadow, not
for writing it, contrary to what the Guix manual claims (??).  For
writing passwords, gnome-control-center does not use AccountsService,
it calls /usr/bin/passwd directly in its source code in
panels/user-accounts/run-passwd.c.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sat, 01 Jun 2019 21:39:01 GMT) Full text and rfc822 format available.

Message #17 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sat, 01 Jun 2019 23:37:51 +0200
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:
>> I wonder what would change /etc/shadow.
>> 
>
> If the error occurred on common non-Guix distros, it hopefully would
> have been fixed before, maybe.  Of course Guix recreates /etc/shadow
> much more frequently.

Definitely.

> Guix appears to add shadow files atomically in gnu/build/accounts.scm.
> I do not know if there could have been an error reading the old shadow
> file, e.g. because it is locked or something?

(gnu build accounts) doesn’t care at all about /etc/.pwd.lock, the lock
file used by libc’s ‘lckpwdf’ function.

This is definitely not a problem when booting.  It could be a problem if
you’re concurrently running ‘guix system reconfigure’ (which runs
activation snippets, including the account updating code) and some other
program, such as ‘passwd’, that assumes it holds an exclusive lock on
the file.  Though in that case, the worst that could happen is that the
changes made by Guix would be undoed by that other program.

> The elogind source code in src/basic/user-util.c contains code for
> locking /etc/shadow, with a comment that explains why its lckpwdf is
> implemented differently from shadow-utils.
>
> AccountsService appears to only be usable for reading /etc/shadow, not
> for writing it, contrary to what the Guix manual claims (??). 

That might be a bug.

> For writing passwords, gnome-control-center does not use
> AccountsService, it calls /usr/bin/passwd directly in its source code
> in panels/user-accounts/run-passwd.c.

That’s definitely a bug to fix: it should invoke
/run/setuid-programs/passwd instead.

Thanks for investigating,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sun, 02 Jun 2019 07:06:02 GMT) Full text and rfc822 format available.

Message #20 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sun, 2 Jun 2019 09:05:45 +0200
On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> This is definitely not a problem when booting.  It could be a problem if
> you’re concurrently running ‘guix system reconfigure’ (which runs
> activation snippets, including the account updating code) and some other
> program, such as ‘passwd’, that assumes it holds an exclusive lock on
> the file.  Though in that case, the worst that could happen is that the
> changes made by Guix would be undoed by that other program.

Thank you for explaining!

OK.  Also why would a read error cause the password to get locked
anyway.  I suppose this is an issue with mingetty or something locking
the password, perhaps after too many failed login attempts.  (Note
that I’m prone to making typos because with this Macbook keyboard key
presses are sometimes recognized twice on and only on a virtual
console.)  I will investigate mingetty next.


I also tried running this script:

#!/run/current-system/profile/bin/bash
MD5=$(sudo md5sum /etc/shadow)
echo "Current /etc/shadow has md5sum: $MD5"
until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do
    sudo guix system roll-back
    sudo guix system reconfigure /etc/config.scm
done
notify-send "/etc/shadow changed!" "Maybe I reproduced the issue."

After repeatedly reconfiguring for some 40 minutes I still got the
same /etc/shadow.

(But I think it broke my motherboard, because recently the output of
reconfigure changed from

[…]
shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).
shepherd: Service user-homes has been started.
shepherd: Service term-auto could not be started.
bootloader successfully installed on '/boot/efi'

to

[…]
shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).
shepherd: Service user-homes has been started.
shepherd: Service term-auto could not be started.
error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:

  Installing for x86_64-efi platform.
  Could not prepare Boot variable: No such file or directory
  /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.

guix system: error: failed to install bootloader /gnu/store/y6p93xjdbpbp0z2kc0gw5yqjppmdsq7g-bootloader-installer


I now get this on every reconfigure.  I tried rebooting; this was a
bad idea; I csnnot boot anymore.  But that is unrelated.  Maybe I will
install GRUB as if it were an external hard drive from now on.)


> 
> > The elogind source code in src/basic/user-util.c contains code for
> > locking /etc/shadow, with a comment that explains why its lckpwdf is
> > implemented differently from shadow-utils.
> >
> > AccountsService appears to only be usable for reading /etc/shadow, not
> > for writing it, contrary to what the Guix manual claims (??). 
> 
> That might be a bug.
> 
> > For writing passwords, gnome-control-center does not use
> > AccountsService, it calls /usr/bin/passwd directly in its source code
> > in panels/user-accounts/run-passwd.c.
> 
> That’s definitely a bug to fix: it should invoke
> /run/setuid-programs/passwd instead.
> 
> Thanks for investigating,
> Ludo’.

I will try and make patches once I can reboot into Guix again.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sun, 02 Jun 2019 09:39:02 GMT) Full text and rfc822 format available.

Message #23 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sun, 02 Jun 2019 11:38:36 +0200
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
>> This is definitely not a problem when booting.  It could be a problem if
>> you’re concurrently running ‘guix system reconfigure’ (which runs
>> activation snippets, including the account updating code) and some other
>> program, such as ‘passwd’, that assumes it holds an exclusive lock on
>> the file.  Though in that case, the worst that could happen is that the
>> changes made by Guix would be undoed by that other program.

Actually, another thing that could happen is that Guix reads an
incomplete /etc/shadow because some other program is writing to it.

In that case, suppose Guix reads a partial /etc/shadow where user
“florian” is missing.  It would then create a new /etc/shadow where the
password for “florian” is uninitialized (or set to the initial value
that appears in config.scm.)

Could it be what happened to you?  You’d have to be running ‘passwd’ or
‘usermod’ or whatever at exactly the same time as ‘guix system
reconfigure’ (and you’d have to be “lucky”).

> I also tried running this script:
>
> #!/run/current-system/profile/bin/bash
> MD5=$(sudo md5sum /etc/shadow)
> echo "Current /etc/shadow has md5sum: $MD5"
> until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do
>     sudo guix system roll-back
>     sudo guix system reconfigure /etc/config.scm
> done
> notify-send "/etc/shadow changed!" "Maybe I reproduced the issue."

The code in (gnu build accounts) is purely functional and deterministic,
so you have no chance of getting a different /etc/shadow with this
script, unless perhaps you concurrently run ‘passwd’ or similar.

> error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:
>
>   Installing for x86_64-efi platform.
>   Could not prepare Boot variable: No such file or directory
>   /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.

Maybe you’ve exhausted the room for those EFI “variables” or something?

Thanks for your debugging work!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sun, 02 Jun 2019 10:22:02 GMT) Full text and rfc822 format available.

Message #26 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sun, 2 Jun 2019 12:21:22 +0200
On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:
> Hi Florian,
> 
> "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> 
> > On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> >> This is definitely not a problem when booting.  It could be a problem if
> >> you’re concurrently running ‘guix system reconfigure’ (which runs
> >> activation snippets, including the account updating code) and some other
> >> program, such as ‘passwd’, that assumes it holds an exclusive lock on
> >> the file.  Though in that case, the worst that could happen is that the
> >> changes made by Guix would be undoed by that other program.
> 
> Actually, another thing that could happen is that Guix reads an
> incomplete /etc/shadow because some other program is writing to it.
> 
> In that case, suppose Guix reads a partial /etc/shadow where user
> “florian” is missing.  It would then create a new /etc/shadow where the
> password for “florian” is uninitialized (or set to the initial value
> that appears in config.scm.)
> 
> Could it be what happened to you?  You’d have to be running ‘passwd’ or
> ‘usermod’ or whatever at exactly the same time as ‘guix system
> reconfigure’ (and you’d have to be “lucky”).
>

No, I did not change my password in a very long time.

Is there no proper cross-application locking mechanism for
/etc/passwd?  elogind uses

struct flock flock = {
  .l_type = F_WRLCK,
  .l_whence = SEEK_SET,
  .l_start = 0,
  .l_len = 0,
};
[…]
fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);
[…]
r = fcntl(fd, F_SETLKW, &flock;

Should Guix adopt something similar for shadow/passwd/… database
reads?




> > error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:
> >
> >   Installing for x86_64-efi platform.
> >   Could not prepare Boot variable: No such file or directory
> >   /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.
> 
> Maybe you’ve exhausted the room for those EFI “variables” or something?
> 
> Thanks for your debugging work!
> 
> Ludo’.

Maybe exhausted, maybe it is an error with the NVRAM.  I will try
making grub-install execute like when installing on external USB
drives so it writes nothing to the motherboard.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Sun, 02 Jun 2019 16:01:02 GMT) Full text and rfc822 format available.

Message #29 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Sun, 02 Jun 2019 18:00:14 +0200
"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:

[...]

>> Actually, another thing that could happen is that Guix reads an
>> incomplete /etc/shadow because some other program is writing to it.
>> 
>> In that case, suppose Guix reads a partial /etc/shadow where user
>> “florian” is missing.  It would then create a new /etc/shadow where the
>> password for “florian” is uninitialized (or set to the initial value
>> that appears in config.scm.)
>> 
>> Could it be what happened to you?  You’d have to be running ‘passwd’ or
>> ‘usermod’ or whatever at exactly the same time as ‘guix system
>> reconfigure’ (and you’d have to be “lucky”).
>>
>
> No, I did not change my password in a very long time.
>
> Is there no proper cross-application locking mechanism for
> /etc/passwd?  elogind uses
>
> struct flock flock = {
>   .l_type = F_WRLCK,
>   .l_whence = SEEK_SET,
>   .l_start = 0,
>   .l_len = 0,
> };
> […]
> fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);
> […]
> r = fcntl(fd, F_SETLKW, &flock;
>
> Should Guix adopt something similar for shadow/passwd/… database
> reads?

We could do that yes, that I’d lean towards using the same thing as libc
and Shadow.  The whole scenario just sounds very unlikely though.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 06:04:02 GMT) Full text and rfc822 format available.

Message #32 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 08:03:01 +0200
[Message part 1 (text/plain, inline)]
After I booted to a Guix install USB, chrooted as described on the
Arch wiki and started a Guix daemon, I could reconfigure as before.
There was no need to fiddle with grub-install.

After multiple reconfigures, it happened again, my /etc/shadow has !
again in the password field.  My recently changed root password became
empty as well, like 35902.  I did not even run sudo concurrently.  The
password just got locked.

The /etc from the “populating from /gnu/store/*-etc” messages has no
significant differences either.



On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> > AccountsService appears to only be usable for reading /etc/shadow, not
> > for writing it, contrary to what the Guix manual claims (??). 
> 
> That might be a bug.
> 

AccountsService obviously can change passwords.  No bug here.  Sorry.
I was confused.


> > For writing passwords, gnome-control-center does not use
> > AccountsService, it calls /usr/bin/passwd directly in its source code
> > in panels/user-accounts/run-passwd.c.
> 
> That’s definitely a bug to fix: it should invoke
> /run/setuid-programs/passwd instead.
>

Find attached two patches that fix GNOME password changing.  Both are
required.

Regards,
Florian
[0001-Add-cracklib-s-password-dictionary-to-cracklib-s-def.patch (text/plain, attachment)]
[0002-Make-gnome-control-center-find-passwd-binary.patch (text/plain, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 06:15:02 GMT) Full text and rfc822 format available.

Message #35 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Gábor Boskovits <boskovits <at> gmail.com>
To: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 08:14:09 +0200
[Message part 1 (text/plain, inline)]
Hello,




pelzflorian (Florian Pelz) <pelzflorian <at> pelzflorian.de> ezt írta (időpont:
2019. jún. 3., Hét 8:04):

> After I booted to a Guix install USB, chrooted as described on the
> Arch wiki and started a Guix daemon, I could reconfigure as before.
> There was no need to fiddle with grub-install.
>
> After multiple reconfigures, it happened again, my /etc/shadow has !
> again in the password field.  My recently changed root password became
> empty as well, like 35902.  I did not even run sudo concurrently.  The
> password just got locked.
>
This is the same thing that happened to me, and there is another report,
regarding passwords being reset. I believe we should merge these two bugs.
I am on a mobile with no convinient way to look up the issue number.

>
> The /etc from the “populating from /gnu/store/*-etc” messages has no
> significant differences either.
>
>
>
> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> > "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> > > AccountsService appears to only be usable for reading /etc/shadow, not
> > > for writing it, contrary to what the Guix manual claims (??).
> >
> > That might be a bug.
> >
>
> AccountsService obviously can change passwords.  No bug here.  Sorry.
> I was confused.
>
>
> > > For writing passwords, gnome-control-center does not use
> > > AccountsService, it calls /usr/bin/passwd directly in its source code
> > > in panels/user-accounts/run-passwd.c.
> >
> > That’s definitely a bug to fix: it should invoke
> > /run/setuid-programs/passwd instead.
> >
>
> Find attached two patches that fix GNOME password changing.  Both are
> required.
>
> Regards,
> Florian
>

On my machine it turned out that the hdd is faulty, so this might be a
hardware error, I will get a replacement drive tomorrow, and check if the
problem still persist.

Best regards,
g_bor

>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 07:19:02 GMT) Full text and rfc822 format available.

Message #38 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 09:18:30 +0200
Please add more logging and/or locking.  Note that the elogind has the
following comment in its locking implementation in
/gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:

        /* This is roughly the same as lckpwdf(), but not as awful. We
         * don't want to use alarm() and signals, hence we implement
         * our own trivial version of this.
         *
         * Note that shadow-utils also takes per-database locks in
         * addition to lckpwdf(). However, we don't given that they
         * are redundant as they invoke lckpwdf() first and keep
         * it during everything they do. The per-database locks are
         * awfully racy, and thus we just won't do them. */

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 13:24:02 GMT) Full text and rfc822 format available.

Message #41 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 03 Jun 2019 15:22:51 +0200
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> After I booted to a Guix install USB, chrooted as described on the
> Arch wiki and started a Guix daemon, I could reconfigure as before.
> There was no need to fiddle with grub-install.
>
> After multiple reconfigures, it happened again, my /etc/shadow has !
> again in the password field.  My recently changed root password became
> empty as well, like 35902.  I did not even run sudo concurrently.  The
> password just got locked.

What were the differences between your config files when you
reconfigured?

Did the config files describe the exact same set of user accounts?

Did the user accounts in the config files differ in any way?

Were the user accounts altered in any way in between reconfigures
(‘passwd’, ‘usermod’, GNOME, etc.)?


Looking at ‘user+group-databases’ in (gnu build accounts), which takes a
list of <user-account> and a list of <user-group> to produce
/etc/{passwd,shadow,group}, the only way this could happen is if
/etc/shadow does not exist at the time of reconfigure.

In that case, ‘user+group-databases’ assumes we’re starting anew, so it
creates /etc/shadow with the initial passwords specified in the OS
config.  At that point, the other passwords are lost forever.

Does Shadow or gnome-control-center or something remove /etc/shadow
altogether while it’s accessing it?

At the very least, adding locking like you suggested should avoid this
class of problems; I’ll look into that.

> From 1eb7699d5036062993a080393bfb4a46d2dc1bea Mon Sep 17 00:00:00 2001
> From: Florian Pelz <pelzflorian <at> pelzflorian.de>
> Date: Mon, 3 Jun 2019 07:19:20 +0200
> Subject: [PATCH 1/2] =?UTF-8?q?Add=20cracklib=E2=80=99s=20password=20dicti?=
>  =?UTF-8?q?onary=20to=20cracklib=E2=80=99s=20default=20output.?=
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> * gnu/packages/password-utils.scm (cracklib): Use `make dict`.

[...]

> From c7c016adc34c591febd0d3630f32dbecdd20ad7c Mon Sep 17 00:00:00 2001
> From: Florian Pelz <pelzflorian <at> pelzflorian.de>
> Date: Sun, 2 Jun 2019 20:01:23 +0200
> Subject: [PATCH 2/2] Make gnome-control-center find passwd binary.
>
> * gnu/packages/gnome.scm (gnome-control-center): Substitute correct path to
>   passwd.

Great, applied both.

Thank you!

Ludo’.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 03 Jun 2019 13:26:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 14:53:02 GMT) Full text and rfc822 format available.

Message #46 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 16:52:09 +0200
On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
> > After multiple reconfigures, it happened again, my /etc/shadow has !
> > again in the password field.  My recently changed root password became
> > empty as well, like 35902.  I did not even run sudo concurrently.  The
> > password just got locked.
> 
> What were the differences between your config files when you
> reconfigured?
>

For the last reconfigure, there were no differences, although I had
rebooted into an unbootable, older generation with a different
syslog.conf and broken Udevd arguments before booting the new
generation.  I suppose the other victims of this bug have not booted
to unbootable generations?


> Did the config files describe the exact same set of user accounts?
>

Yes, they’re the same.

> Did the user accounts in the config files differ in any way?
>

No, they do not differ.

> Were the user accounts altered in any way in between reconfigures
> (‘passwd’, ‘usermod’, GNOME, etc.)?
> 
>
> Looking at ‘user+group-databases’ in (gnu build accounts), which takes a
> list of <user-account> and a list of <user-group> to produce
> /etc/{passwd,shadow,group}, the only way this could happen is if
> /etc/shadow does not exist at the time of reconfigure.
>
> In that case, ‘user+group-databases’ assumes we’re starting anew, so it
> creates /etc/shadow with the initial passwords specified in the OS
> config.  At that point, the other passwords are lost forever.
> 
> Does Shadow or gnome-control-center or something remove /etc/shadow
> altogether while it’s accessing it?
>

I did not use gnome-control-center or shadow or sudo during the last
reconfigure, except sudo for starting the reconfigure.

> At the very least, adding locking like you suggested should avoid this
> class of problems; I’ll look into that.
> 

I do not know if something somehow accessed /etc/shadow in the
background without my knowledge.  I believe locks are important anyway
to have more guarantees passwords are not lost when Guix is run on a
sensitive multi-user setup.  Thank you for looking into it.

If locks do not stop these issues, it would be nice to have more
detailed logs of shadow changes written to syslog on reconfigure
and/or on reboot.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 15:24:01 GMT) Full text and rfc822 format available.

Message #49 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 03 Jun 2019 17:22:55 +0200
[Message part 1 (text/plain, inline)]
Hello,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> Please add more logging and/or locking.  Note that the elogind has the
> following comment in its locking implementation in
> /gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:
>
>         /* This is roughly the same as lckpwdf(), but not as awful. We
>          * don't want to use alarm() and signals, hence we implement
>          * our own trivial version of this.

Attach is a set of patches to lock /etc/.pwd.lock when we access
/etc/{passwd,shadow,group} from the activation snippets.  It should
ensure that, when running ‘guix system reconfigure’, we’re honoring the
locking protocol that Shadow & co. follow.

It would be great if you could test again in this context.

Thanks!

Ludo’.

[0001-syscalls-Add-with-file-lock-macro.patch (text/x-patch, attachment)]
[0002-syscalls-with-file-lock-expands-to-a-call-to-call-wi.patch (text/x-patch, attachment)]
[0003-syscalls-with-lock-file-catches-ENOSYS.patch (text/x-patch, attachment)]
[0004-activation-Lock-etc-.pwd.lock-before-accessing-datab.patch (text/x-patch, attachment)]
[0005-nar-Really-lock-store-files.patch (text/x-patch, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 16:02:01 GMT) Full text and rfc822 format available.

Message #52 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org,
 "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 18:01:49 +0200
[Message part 1 (text/plain, inline)]
For debugging, I've used "chattr +i" in the past in order to make such files
(i.e. /etc/shadow) immutable.  This would hopefully expose the mutator (other
than Guix)--it should log some error somewhere.

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Mon, 03 Jun 2019 17:09:01 GMT) Full text and rfc822 format available.

Message #55 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Mon, 3 Jun 2019 19:07:57 +0200
All this is not a problem with name service cache daemon, is it?

On Mon, Jun 03, 2019 at 05:22:55PM +0200, Ludovic Courtès wrote:
> Attach is a set of patches to lock /etc/.pwd.lock when we access
> /etc/{passwd,shadow,group} from the activation snippets.  It should
> ensure that, when running ‘guix system reconfigure’, we’re honoring the
> locking protocol that Shadow & co. follow.
> 
> It would be great if you could test again in this context.
> 

Thank you!  Will do.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Tue, 04 Jun 2019 09:23:01 GMT) Full text and rfc822 format available.

Message #58 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Tue, 04 Jun 2019 11:22:45 +0200
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
>> > After multiple reconfigures, it happened again, my /etc/shadow has !
>> > again in the password field.  My recently changed root password became
>> > empty as well, like 35902.  I did not even run sudo concurrently.  The
>> > password just got locked.
>> 
>> What were the differences between your config files when you
>> reconfigured?
>>
>
> For the last reconfigure, there were no differences, although I had
> rebooted into an unbootable, older generation with a different
> syslog.conf and broken Udevd arguments before booting the new
> generation.

What’s the effect of this brokenness concretely?  Is the wrong root file
system mounted, or something like that?

Could it somehow lead Guix to stumble upon an empty or missing
/etc/shadow when it boots?

> I suppose the other victims of this bug have not booted to unbootable
> generations?

It’d be great if the other victims would speak up.  :-)

> If locks do not stop these issues, it would be nice to have more
> detailed logs of shadow changes written to syslog on reconfigure
> and/or on reboot.

There really isn’t much to log: the activation code reads
/etc/{shadow,passwd,group}, computes the list of shadow/passwd/group
entries as a function of that, and writes it.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Tue, 04 Jun 2019 12:18:01 GMT) Full text and rfc822 format available.

Message #61 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Tue, 4 Jun 2019 14:17:11 +0200
On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:
> Hi,
> 
> "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> 
> > On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
> >> > After multiple reconfigures, it happened again, my /etc/shadow has !
> >> > again in the password field.  My recently changed root password became
> >> > empty as well, like 35902.  I did not even run sudo concurrently.  The
> >> > password just got locked.
> >> 
> >> What were the differences between your config files when you
> >> reconfigured?
> >>
> >
> > For the last reconfigure, there were no differences, although I had
> > rebooted into an unbootable, older generation with a different
> > syslog.conf and broken Udevd arguments before booting the new
> > generation.
> 
> What’s the effect of this brokenness concretely?  Is the wrong root file
> system mounted, or something like that?
> 

I have multiple broken generation.  On one that now for a third time
(on old generations without Ludo’s patches) led to a locked
/etc/shadow after booting I changed the line
(let ((pid (fork+exec-command (list udevd))))
in gnu/services/base.scm to, I believe, this:
(let ((pid (fork+exec-command (list udevd "--debug-trace"))))

(I am unsure if this is the same broken generation as on my first
report of the issue.  I may have gotten confused.)

This is unbootable, correct would have been --debug and not
--debug-trace.

I may also have changed my syslog configuration to the incorrect

                   (modify-services %desktop-services
                     (syslog-service-type config =>
                       (syslog-configuration
                        (inherit config)
                        (config-file
(plain-file "my-syslog.conf" "
     # Log all error messages, authentication messages of
     # level notice or higher and anything of level err or
     # higher to the console.
     # Don't log private authentication messages!
     *       /var/log/full
[…]")))))))

Correct would have been *.* instead of *  This latter error is
without relevant effect I believe.

I will try to find the /gnu/store files for this generation.

Danny’s suggestion to `chattr +i /etc/shadow` leads to an error with
rename-file trying to rename an empty /etc/shadow.Gi… temporary file
on both this old broken and on healthy generations.


> There really isn’t much to log: the activation code reads
> /etc/{shadow,passwd,group}, computes the list of shadow/passwd/group
> entries as a function of that, and writes it.
> 

If I cannot find a more deterministic way, I will try making (guix
build accounts) print the content of shadow.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Tue, 04 Jun 2019 14:13:02 GMT) Full text and rfc822 format available.

Message #64 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Tue, 4 Jun 2019 16:12:17 +0200
On Tue, Jun 04, 2019 at 02:17:11PM +0200, pelzflorian (Florian Pelz) wrote:
> On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:
> > What’s the effect of this brokenness concretely?  Is the wrong root file
> > system mounted, or something like that?
> > 
> 

When removing quiet from the linux command line, shepherd complains
incessantly that it is trying to load udev.


All seems irrelevant, but:

> I have multiple broken generation.  On one that now for a third time
> (on old generations without Ludo’s patches) led to a locked
> /etc/shadow after booting I changed the line
> (let ((pid (fork+exec-command (list udevd))))
> in gnu/services/base.scm to, I believe, this:
> (let ((pid (fork+exec-command (list udevd "--debug-trace"))))
> 
> (I am unsure if this is the same broken generation as on my first
> report of the issue.  I may have gotten confused.)
> 
> This is unbootable, correct would have been --debug and not
> --debug-trace.
>

The line in
/gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scm
referenced from
/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
from the broken /var/guix/profiles/system-35-link/boot
is

(let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))

with an unneeded --verbose.

> I may also have changed my syslog configuration to the incorrect
> 

No, the syslog.conf in
/gnu/store/y5nrfbj52vlnj77iyki9hbji8qjwk86d-syslog.conf referenced
from /gnu/store/5bp6c0p357gaqikxkmvs0idrmvdrzf7h-shepherd-syslogd.scm
referenced from
/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
from the broken /var/guix/profiles/system-35-link/boot
appears to be the default syslog.conf.

Booting this old generation and then an older working generation
sometimes leads to a broken /etc/shadow.  I do not yet know if a
broken /etc/shadow can result when booting this old generation and
then a new patched generation.

I will reconfigure some more and try getting a bad /etc/shadow even
with Ludo’s patches.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Tue, 04 Jun 2019 17:18:02 GMT) Full text and rfc822 format available.

Message #67 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Tue, 4 Jun 2019 19:17:15 +0200
I got a locked /etc/shadow again now despite Ludovic’s patches (which
would nonetheless give me a better feeling when pushed).

When booting an unbootable generation with Ludovic’s patches and then
rebooting a normal generation with Ludovic’s patches, /etc/shadow is
locked.

Note that I get a message like “/dev/mapper/Guix: recovering journal”
when booting (I did not pay attention to that before).  I shut down
the unbootable generation with Ctrl+Alt+Del.  When I normally shut
down with Ctrl+Alt+Del I get no such message.

The unbootable generation was deliberately made unbootable like this:

On Tue, Jun 04, 2019 at 04:12:17PM +0200, pelzflorian (Florian Pelz) wrote:
> > I have multiple broken generation.  On one that now for a third time
> > (on old generations without Ludo’s patches) led to a locked
> > /etc/shadow after booting I changed the line
> > (let ((pid (fork+exec-command (list udevd))))
> > in gnu/services/base.scm to, I believe, this:
> > (let ((pid (fork+exec-command (list udevd "--debug-trace"))))
> > 
> > (I am unsure if this is the same broken generation as on my first
> > report of the issue.  I may have gotten confused.)
> > 
> > This is unbootable, correct would have been --debug and not
> > --debug-trace.
> >
> 
> The line in
> /gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scm
> referenced from
> /gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
> from the broken /var/guix/profiles/system-35-link/boot
> is
> 
> (let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))
> 

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Tue, 04 Jun 2019 21:22:02 GMT) Full text and rfc822 format available.

Message #70 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Tue, 04 Jun 2019 23:21:05 +0200
[Message part 1 (text/plain, inline)]
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> I got a locked /etc/shadow again now despite Ludovic’s patches (which
> would nonetheless give me a better feeling when pushed).

Will do.  :-)

> When booting an unbootable generation with Ludovic’s patches and then
> rebooting a normal generation with Ludovic’s patches, /etc/shadow is
> locked.

So with this scenario, the problem is 100% reproducible, right?

> Note that I get a message like “/dev/mapper/Guix: recovering journal”
> when booting (I did not pay attention to that before).  I shut down
> the unbootable generation with Ctrl+Alt+Del.  When I normally shut
> down with Ctrl+Alt+Del I get no such message.

Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
it has loaded its config file.

In your case, loading the config file never completes because the
‘start’ method is called from the config file for every service, and one
of them, udev, never starts.  Thus, when you press Ctrl-Alt-Del, you
perform a hard reboot.

The hard reboot happens after Guix has written to /etc/shadow.  One
possibility is that changes to this file haven’t been flushed to disk.
Thus, on the next boot, we start off with an empty or truncated
/etc/shadow, leading the activation snippet to initialize passwords.


If that theory holds, the patch below (on top of the others) should
help.  Could you give it a try?

Actually, the fact that ‘rename-file’ was called *before* ‘close-port’
could be problematic in itself; so perhaps, even without the ‘fdatasync’
call, we’d get better results…  especially since ‘fdatasync’ won’t be
available in the initrd anyway, hmm…

Thanks,
Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/build/accounts.scm b/gnu/build/accounts.scm
index 8687446aa6..c13e6f2e89 100644
--- a/gnu/build/accounts.scm
+++ b/gnu/build/accounts.scm
@@ -19,6 +19,7 @@
 (define-module (gnu build accounts)
   #:use-module (guix records)
   #:use-module (guix combinators)
+  #:use-module ((guix build syscalls) #:select (fdatasync))
   #:use-module (gnu system accounts)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-11)
@@ -230,6 +231,14 @@ each field."
   ;; grab this lock with 'with-file-lock' when they access the databases.
   "/etc/.pwd.lock")
 
+(define-syntax-rule (catch-ENOSYS exp)
+  (catch 'system-error
+    (lambda () exp)
+    (lambda args
+      (if (= ENOSYS (system-error-errno args))
+          #f
+          (apply throw args)))))
+
 (define (database-writer file mode entry->string)
   (lambda* (entries #:optional (file-or-port file))
     "Write ENTRIES to FILE-OR-PORT.  When FILE-OR-PORT is a file name, write
@@ -249,9 +258,12 @@ to it atomically and set the appropriate permissions."
             (lambda ()
               (chmod port mode)
               (write-entries port)
-              (rename-file template file-or-port))
-            (lambda ()
+              (catch-ENOSYS (fdatasync port))
               (close-port port)
+              (rename-file template file-or-port))
+            (lambda ()
+              (unless (port-closed? port)
+                (close-port port))
               (when (file-exists? template)
                 (delete-file template))))))))
 

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Wed, 05 Jun 2019 06:17:01 GMT) Full text and rfc822 format available.

Message #73 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Wed, 5 Jun 2019 08:16:11 +0200
On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> > When booting an unbootable generation with Ludovic’s patches and then
> > rebooting a normal generation with Ludovic’s patches, /etc/shadow is
> > locked.
> 
> So with this scenario, the problem is 100% reproducible, right?
> 

/etc/shadow is not always locked, even if I get a recovering journal
message.


> If that theory holds, the patch below (on top of the others) should
> help.  Could you give it a try?
> 

Will do.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Wed, 05 Jun 2019 09:55:02 GMT) Full text and rfc822 format available.

Message #76 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Wed, 05 Jun 2019 11:54:23 +0200
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:

[...]

>> If that theory holds, the patch below (on top of the others) should
>> help.  Could you give it a try?
>> 
>
> Will do.

Note that you’ll have to create a new “broken” generation with this
patch (because we already know that the old one can corrupt
/etc/shadow.)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Wed, 05 Jun 2019 11:08:01 GMT) Full text and rfc822 format available.

Message #79 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Wed, 5 Jun 2019 13:06:58 +0200
It appears your patch fixes the issue.  I admire the speed at which
you write patches. :)  Thank you!

On Wed, Jun 05, 2019 at 11:54:23AM +0200, Ludovic Courtès wrote:
> Note that you’ll have to create a new “broken” generation with this
> patch (because we already know that the old one can corrupt
> /etc/shadow.)
>

I created a new working generation and then a new unbootable
generation with broken udevd args, both with all your patches.  I
rebooted the broken and then the working generation repeatedly twelve
times.  I waited varying amounts of time before doing Ctrl+Alt+Del in
the broken generation.  /etc/shadow is still in good health.

However:

On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
> it has loaded its config file.

Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-del
at the end?  I get recovering journal messages unless on the previous
boot I waited for the whole GDM to start (I can login on the TTY
before GDM has fully started), which takes a long time during which
users could change their mind and decide they do not want to boot.
(The Macbook is not fast anyway and Guix is even slower when booting
compared to Debian.)

Regards,
Florian




Merged 35902 35996. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 05 Jun 2019 15:32:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Wed, 05 Jun 2019 21:14:01 GMT) Full text and rfc822 format available.

Message #84 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Wed, 05 Jun 2019 23:13:34 +0200
[Message part 1 (text/plain, inline)]
"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> It appears your patch fixes the issue.  I admire the speed at which
> you write patches. :)  Thank you!

Awesome!  I must say that I’m really glad you’re putting this much
energy into reproducing issues and investigating—it’s rare for people
who report bug to dig this deep, but it’s super helpful and motivating!

I’ve pushed the whole series:

  d088d5c484 accounts: Call 'fdatasync' when writing databases.
  ed8570dce3 accounts: Close database before renaming it.
  70a7a1b5dc nar: Really lock store files.
  d497b6ab39 activation: Lock /etc/.pwd.lock before accessing databases.
  5f0cf1df71 syscalls: 'with-lock-file' catches ENOSYS.
  89ceb86ad4 syscalls: 'with-file-lock' expands to a call to 'call-with-file-lock'.
  b7178c22bf syscalls: Add 'with-file-lock' macro.

The actual fix is ed8570dce3, AIUI.

> I created a new working generation and then a new unbootable
> generation with broken udevd args, both with all your patches.  I
> rebooted the broken and then the working generation repeatedly twelve
> times.  I waited varying amounts of time before doing Ctrl+Alt+Del in
> the broken generation.  /etc/shadow is still in good health.

Good.

> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
>> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
>> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
>> it has loaded its config file.
>
> Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-del
> at the end?  I get recovering journal messages unless on the previous
> boot I waited for the whole GDM to start (I can login on the TTY
> before GDM has fully started), which takes a long time during which
> users could change their mind and decide they do not want to boot.
> (The Macbook is not fast anyway and Guix is even slower when booting
> compared to Debian.)

I agree.

The attached patch for Shepherd moves everything before loading the
config file.  I think it will have the desired effect, though I’m not
entirely sure the signal handler would run at the right time etc.

You can test it on the metal if you want (you need to add the patch to
the ‘shepherd’ package), but I’ll see if I can test in a VM.

Thank you!

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 8b2cc1d..769085a 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -198,34 +198,6 @@ socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
       ;; Start the 'root' service.
       (start root-service)
 
-      ;; This _must_ succeed.  (We could also put the `catch' around
-      ;; `main', but it is often useful to get the backtrace, and
-      ;; `caught-error' does not do this yet.)
-      (catch #t
-        (lambda ()
-          (load-in-user-module (or config-file (default-config-file))))
-        (lambda (key . args)
-          (caught-error key args)
-          (quit 1)))
-      ;; Start what was started last time.
-      (and persistency
-           (catch 'system-error
-             (lambda ()
-               (start-in-order (read (open-input-file
-                                      persistency-state-file))))
-             (lambda (key . args)
-               (apply format #f (gettext (cadr args)) (caddr args))
-               (quit 1))))
-
-      (when (provided? 'threads)
-        ;; XXX: This terrible hack allows us to make sure that signal handlers
-        ;; get a chance to run in a timely fashion.  Without it, after an EINTR,
-        ;; we could restart the accept(2) call below before the corresponding
-        ;; async has been queued.  See the thread at
-        ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.
-        (sigaction SIGALRM (lambda _ (alarm 1)))
-        (alarm 1))
-
       (when (= 1 (getpid))
         ;; When running as PID 1, disable hard reboots upon ctrl-alt-del.
         ;; Instead, the kernel will send us SIGINT so that we can gracefully
@@ -259,6 +231,34 @@ socket file at FILE-NAME upon exit of PROC.  Return the values of PROC."
         (lambda _
           (stop root-service)))
 
+      ;; This _must_ succeed.  (We could also put the `catch' around
+      ;; `main', but it is often useful to get the backtrace, and
+      ;; `caught-error' does not do this yet.)
+      (catch #t
+        (lambda ()
+          (load-in-user-module (or config-file (default-config-file))))
+        (lambda (key . args)
+          (caught-error key args)
+          (quit 1)))
+      ;; Start what was started last time.
+      (and persistency
+           (catch 'system-error
+             (lambda ()
+               (start-in-order (read (open-input-file
+                                      persistency-state-file))))
+             (lambda (key . args)
+               (apply format #f (gettext (cadr args)) (caddr args))
+               (quit 1))))
+
+      (when (provided? 'threads)
+        ;; XXX: This terrible hack allows us to make sure that signal handlers
+        ;; get a chance to run in a timely fashion.  Without it, after an EINTR,
+        ;; we could restart the accept(2) call below before the corresponding
+        ;; async has been queued.  See the thread at
+        ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.
+        (sigaction SIGALRM (lambda _ (alarm 1)))
+        (alarm 1))
+
       ;; Ignore SIGPIPE so that we don't die if a client closes the connection
       ;; prematurely.
       (sigaction SIGPIPE SIG_IGN)

Information forwarded to bug-guix <at> gnu.org:
bug#35996; Package guix. (Thu, 06 Jun 2019 07:02:01 GMT) Full text and rfc822 format available.

Message #87 received at 35996 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 35996 <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Thu, 6 Jun 2019 09:01:44 +0200
On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:
> Awesome!  I must say that I’m really glad you’re putting this much
> energy into reproducing issues and investigating—it’s rare for people
> who report bug to dig this deep, but it’s super helpful and motivating!

:)


> The attached patch for Shepherd moves everything before loading the
> config file.  I think it will have the desired effect, though I’m not
> entirely sure the signal handler would run at the right time etc.
>

It works for me without recovering journal message (and taking an
insignificantly longer time to reboot).

Regards,
Florian




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 06 Jun 2019 08:05:02 GMT) Full text and rfc822 format available.

Notification sent to "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>:
bug acknowledged by developer. (Thu, 06 Jun 2019 08:05:02 GMT) Full text and rfc822 format available.

Message #92 received at 35996-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: 35996-done <at> debbugs.gnu.org
Subject: Re: bug#35996: User account password got locked when booting old
 generation
Date: Thu, 06 Jun 2019 10:04:33 +0200
Hello!

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:

[...]

>> The attached patch for Shepherd moves everything before loading the
>> config file.  I think it will have the desired effect, though I’m not
>> entirely sure the signal handler would run at the right time etc.
>>
>
> It works for me without recovering journal message (and taking an
> insignificantly longer time to reboot).

Excellent.  Pushed as Shepherd commit
c6f250d1fd1afa9ee49c8bb2414eee087b672789.

Thank you!

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 06 Jun 2019 08:05:02 GMT) Full text and rfc822 format available.

Notification sent to Marlon <mbmattos1113 <at> firemail.cc>:
bug acknowledged by developer. (Thu, 06 Jun 2019 08:05:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 04 Jul 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 296 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.