GNU bug report logs - #30993
'ssh-daemon' fails to start

Previous Next

Package: guix;

Reported by: Leo Famulari <leo <at> famulari.name>

Date: Thu, 29 Mar 2018 20:09:01 UTC

Severity: important

Tags: fixed, unreproducible

Merged with 33299, 34580, 37309

Done: maxim.cournoyer <at> gmail.com

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30993 in the body.
You can then email your comments to 30993 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 29 Mar 2018 20:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Leo Famulari <leo <at> famulari.name>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 29 Mar 2018 20:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: bug-guix <at> gnu.org
Subject: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 29 Mar 2018 16:08:03 -0400
[Message part 1 (text/plain, inline)]
Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
killed almost immediately after it starts with signal 15. I confirmed
the issue started with the Shepherd upgrade by bisecting our Git
history.

I can reproduce the issue from commit
b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).

One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
created anymore. And if I create an empty PID file by hand, it is
removed after trying to start the ssh-daemon service. Also, the sshd
user's home '/var/run/sshd' does not exist, and is similarly removed if
it does exist.

I ran the OpenSSH system test `make check-system TESTS=openssh` and it
failed when it could not find the PID file. It passed on another
non-GuixSD machine. The failing machine is relatively slow and lacks
KVM: a ThinkPad x200s.

After boot, trying to start the service again with `herd start
ssh-daemon` gives the same result.

I modified the sshd invocation to print some debug output ('-d -E
/tmp/sshd.log') and this is what it shows:

------
debug1: sshd version OpenSSH_7.6, OpenSSL 1.0.2o  27 Mar 2018
debug1: private host key #0: ssh-rsa SHA256:REDACTED
debug1: private host key #1: ssh-dss SHA256:REDACTED
debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:REDACTED
debug1: private host key #3: ssh-ed25519 SHA256:REDACTED
debug1: rexec_argv[0]='/gnu/store/az7vib8gk16fybhshh5xpkljmgxyrs4k-openssh-7.6p1/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-d'
debug1: rexec_argv[3]='-E'
debug1: rexec_argv[4]='/tmp/sshd.log'
debug1: rexec_argv[5]='-f'
debug1: rexec_argv[6]='/gnu/store/miy7xg5j4fg3mn04mcl27awmcl6s97ss-sshd_config'
debug1: Set /proc/self/oom_score_adj from 0 to -1000
debug1: Bind to port 22 on 0.0.0.0.
Server listening on 0.0.0.0 port 22.
debug1: Bind to port 22 on ::.
Server listening on :: port 22.
Received signal 15; terminating.
------

My system configuration file, the shepherd log messages, and the OpenSSH
system test logs are attached. Any ideas?
[shepherd.log (text/plain, attachment)]
[system.scm (text/plain, attachment)]
[check.log (text/plain, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Fri, 06 Apr 2018 08:22:01 GMT) Full text and rfc822 format available.

Message #8 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Fri, 06 Apr 2018 10:21:09 +0200
Hi Leo,

Leo Famulari <leo <at> famulari.name> skribis:

> Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
> killed almost immediately after it starts with signal 15. I confirmed
> the issue started with the Shepherd upgrade by bisecting our Git
> history.
>
> I can reproduce the issue from commit
> b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).
>
> One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
> created anymore. And if I create an empty PID file by hand, it is
> removed after trying to start the ssh-daemon service. Also, the sshd
> user's home '/var/run/sshd' does not exist, and is similarly removed if
> it does exist.

Weird.  On my laptop /var/run/sshd.pid exists and contains the right
PID.  /var/run/sshd does not exist, though.

When I run the config you posted in ‘guix system vm’, ‘ssh-daemon’ is
correctly started and I see a correct ssh.pid and /var/run/sshd exists
as well.

> I ran the OpenSSH system test `make check-system TESTS=openssh` and it
> failed when it could not find the PID file. It passed on another
> non-GuixSD machine. The failing machine is relatively slow and lacks
> KVM: a ThinkPad x200s.

FWIW on my x86_64 laptop, I’ve run it several times in a row (using
“guix build /gnu/store/…-openssh.drv --check”), and it always succeeds.

> I modified the sshd invocation to print some debug output ('-d -E
> /tmp/sshd.log') and this is what it shows:
>
> ------
> debug1: sshd version OpenSSH_7.6, OpenSSL 1.0.2o  27 Mar 2018
> debug1: private host key #0: ssh-rsa SHA256:REDACTED
> debug1: private host key #1: ssh-dss SHA256:REDACTED
> debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:REDACTED
> debug1: private host key #3: ssh-ed25519 SHA256:REDACTED
> debug1: rexec_argv[0]='/gnu/store/az7vib8gk16fybhshh5xpkljmgxyrs4k-openssh-7.6p1/sbin/sshd'
> debug1: rexec_argv[1]='-D'
> debug1: rexec_argv[2]='-d'
> debug1: rexec_argv[3]='-E'
> debug1: rexec_argv[4]='/tmp/sshd.log'
> debug1: rexec_argv[5]='-f'
> debug1: rexec_argv[6]='/gnu/store/miy7xg5j4fg3mn04mcl27awmcl6s97ss-sshd_config'
> debug1: Set /proc/self/oom_score_adj from 0 to -1000
> debug1: Bind to port 22 on 0.0.0.0.
> Server listening on 0.0.0.0 port 22.
> debug1: Bind to port 22 on ::.
> Server listening on :: port 22.
> Received signal 15; terminating.
> ------

Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
when a service is stopped, and otherwise it’s sent when we’re shutting
the system down.  But here?…

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Fri, 06 Apr 2018 12:42:02 GMT) Full text and rfc822 format available.

Message #11 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Fri, 6 Apr 2018 08:41:01 -0400
[Message part 1 (text/plain, inline)]
On Fri, Apr 06, 2018 at 10:21:09AM +0200, Ludovic Courtès wrote:
[...]
> > Server listening on 0.0.0.0 port 22.
> > debug1: Bind to port 22 on ::.
> > Server listening on :: port 22.
> > Received signal 15; terminating.
> > ------
> 
> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
> when a service is stopped, and otherwise it’s sent when we’re shutting
> the system down.  But here?…

Indeed, that is the question. Does anyone have advice for debugging?

I still have this issue and so I'm continuing to use Shepherd 0.3.2.
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 06 Apr 2018 14:34:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Fri, 06 Apr 2018 14:38:01 GMT) Full text and rfc822 format available.

Message #16 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Fri, 06 Apr 2018 16:37:24 +0200
Leo Famulari <leo <at> famulari.name> skribis:

> On Fri, Apr 06, 2018 at 10:21:09AM +0200, Ludovic Courtès wrote:
> [...]
>> > Server listening on 0.0.0.0 port 22.
>> > debug1: Bind to port 22 on ::.
>> > Server listening on :: port 22.
>> > Received signal 15; terminating.
>> > ------
>> 
>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>> when a service is stopped, and otherwise it’s sent when we’re shutting
>> the system down.  But here?…
>
> Indeed, that is the question. Does anyone have advice for debugging?
>
> I still have this issue and so I'm continuing to use Shepherd 0.3.2.

For you the bug is 100% reproducible with 0.4.0 and never happens with
0.3.2?  Even in a ‘guix system vm’?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Tue, 01 May 2018 13:14:01 GMT) Full text and rfc822 format available.

Message #19 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: bug-guix <at> gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Tue, 1 May 2018 15:13:05 +0200
(resending because I forgot to CC the list)

On 06.04.2018 14:41, Leo Famulari wrote:
>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>> when a service is stopped, and otherwise it’s sent when we’re shutting
>> the system down.  But here?…
>
> Indeed, that is the question. Does anyone have advice for debugging?
>
> I still have this issue and so I'm continuing to use Shepherd 0.3.2.
>

Maybe one could somehow strace herd, or change the make-kill-destructor
to log every time it is being executed?

I'm not that fluent in guile so maybe someone else could do that?




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Tue, 01 May 2018 20:45:01 GMT) Full text and rfc822 format available.

Message #22 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Tue, 01 May 2018 22:43:58 +0200
Martin Castillo <castilma <at> uni-bremen.de> skribis:

> (resending because I forgot to CC the list)
>
> On 06.04.2018 14:41, Leo Famulari wrote:
>>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>>> when a service is stopped, and otherwise it’s sent when we’re shutting
>>> the system down.  But here?…
>>
>> Indeed, that is the question. Does anyone have advice for debugging?
>>
>> I still have this issue and so I'm continuing to use Shepherd 0.3.2.
>>
>
> Maybe one could somehow strace herd, or change the make-kill-destructor
> to log every time it is being executed?

‘herd status sshd’ displays the last time sshd was respawned, but the
info ‘herd’ receives actually includes the dates of all the respawns,
not just the last one.  Is that what you’re asking for?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 03 May 2018 15:17:01 GMT) Full text and rfc822 format available.

Message #25 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 3 May 2018 17:16:32 +0200

On 01.05.2018 22:43, Ludovic Courtès wrote:
>> Maybe one could somehow strace herd, or change the make-kill-destructor
>> to log every time it is being executed?
> 
> ‘herd status sshd’ displays the last time sshd was respawned, but the
> info ‘herd’ receives actually includes the dates of all the respawns,
> not just the last one.  Is that what you’re asking for?
> 

My idea was that maybe make-kill-destructor is being called from
somewhere else. If this is being
logged, one could rule that out.

Another wild idea would be sshd killing itself for some reason. stracing
sshd would tell us, if that's the case. How would one do that? Does
shepherd provide some debugging functions?

Or does linux provide a way to log all sent signals so one could find
the sending process?

Martin

-- 
GPG: 7FDE 7190 2F73 2C50 236E  403D CC13 48F1 E644 08EC




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 03 May 2018 16:39:01 GMT) Full text and rfc822 format available.

Message #28 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 30993 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 3 May 2018 12:38:08 -0400
[Message part 1 (text/plain, inline)]
On Thu, May 03, 2018 at 05:16:32PM +0200, Martin Castillo wrote:
> 
> 
> On 01.05.2018 22:43, Ludovic Courtès wrote:
> >> Maybe one could somehow strace herd, or change the make-kill-destructor
> >> to log every time it is being executed?
> > 
> > ‘herd status sshd’ displays the last time sshd was respawned, but the
> > info ‘herd’ receives actually includes the dates of all the respawns,
> > not just the last one.  Is that what you’re asking for?
> > 
> 
> My idea was that maybe make-kill-destructor is being called from
> somewhere else. If this is being
> logged, one could rule that out.
> 
> Another wild idea would be sshd killing itself for some reason. stracing
> sshd would tell us, if that's the case. How would one do that? Does
> shepherd provide some debugging functions?
> 
> Or does linux provide a way to log all sent signals so one could find
> the sending process?

I haven't had time to debug this yet, and Shepherd 0.3 still works.

Since nobody else can reproduce the bug, and since I expect OpenSSH to
be commonly used, I suspect some non-deterministic Guile mis-compilation
or filesystem corruption — the system in question is using ext4.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Fri, 04 May 2018 02:02:02 GMT) Full text and rfc822 format available.

Message #31 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Fri, 04 May 2018 04:01:52 +0200
Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Sun, 06 May 2018 19:51:01 GMT) Full text and rfc822 format available.

Message #34 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Martin Castillo <castilma <at> uni-bremen.de>
Cc: 30993 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Sun, 6 May 2018 15:50:50 -0400
[Message part 1 (text/plain, inline)]
On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.

Interesting. Did it ever work for you on that system?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Mon, 07 May 2018 19:35:01 GMT) Full text and rfc822 format available.

Message #37 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Mon, 7 May 2018 21:10:37 +0200
[Message part 1 (text/plain, inline)]

On 06.05.2018 21:50, Leo Famulari wrote:
> On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
>> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.
> 
> Interesting. Did it ever work for you on that system?
> 
> that system?
Do you mean shepherd 0.3? Yes. And once(or so) with shepherd 0.4.

I reported that here [0]. Some of the mentioned files needed small
changes for the current guix, but ssh works with all of them, strangely.

I attached my current configuration, where I need to start the daemon
manually (herd start ssh-daemon) after each boot.

Martin



[0]: https://lists.gnu.org/archive/html/help-guix/2018-01/msg00112.html
[cur.scm (text/x-scheme, attachment)]

Added tag(s) unreproducible. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Fri, 15 Jun 2018 09:09:01 GMT) Full text and rfc822 format available.

Merged 30993 32197. Request was from Leo Famulari <leo <at> famulari.name> to control <at> debbugs.gnu.org. (Wed, 18 Jul 2018 18:30:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 19 Jul 2018 13:16:01 GMT) Full text and rfc822 format available.

Message #44 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Martin Castillo <castilma <at> uni-bremen.de>, Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org, Julien Lepiller <julien <at> lepiller.eu>,
 32197 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 19 Jul 2018 15:15:48 +0200
Heya,

Martin Castillo <castilma <at> uni-bremen.de> writes:

> On 06.05.2018 21:50, Leo Famulari wrote:
>> On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
>>> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.
>> 
>> Interesting. Did it ever work for you on that system?
>> 
>> that system?
> Do you mean shepherd 0.3? Yes. And once(or so) with shepherd 0.4.
>
> I reported that here [0]. Some of the mentioned files needed small
> changes for the current guix, but ssh works with all of them, strangely.
>
> I attached my current configuration, where I need to start the daemon
> manually (herd start ssh-daemon) after each boot.

I don't think you're having the same bug.  Martin can manually start the
daemon, whereas Leo can't.  So Martin seems to have
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32197, for which a commit
has been pushed by Julien.

Martin, could you please test it?  (You just need to 'guix pull' and try
again.)  Leo, if you confirm my analysis, could you please unmerge the
bugs?

Thanks,
Clément




Disconnected #30993 from all other report(s). Request was from clement <at> lassieur.org (Clément Lassieur) to control <at> debbugs.gnu.org. (Thu, 19 Jul 2018 14:25:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 19 Jul 2018 14:28:01 GMT) Full text and rfc822 format available.

Message #49 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Clément Lassieur <clement <at> lassieur.org>
To: Martin Castillo <castilma <at> uni-bremen.de>, Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org, Julien Lepiller <julien <at> lepiller.eu>,
 32197 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 19 Jul 2018 16:26:59 +0200
Clément Lassieur <clement <at> lassieur.org> writes:

> Leo, if you confirm my analysis, could you please unmerge the bugs?

I did it, because Eric confirmed the fix.
Clément




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Thu, 19 Jul 2018 16:58:02 GMT) Full text and rfc822 format available.

Message #52 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Clément Lassieur <clement <at> lassieur.org>
Cc: Martin Castillo <castilma <at> uni-bremen.de>,
 Julien Lepiller <julien <at> lepiller.eu>, 32197 <at> debbugs.gnu.org,
 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Thu, 19 Jul 2018 12:57:30 -0400
[Message part 1 (text/plain, inline)]
On Thu, Jul 19, 2018 at 04:26:59PM +0200, Clément Lassieur wrote:
> Clément Lassieur <clement <at> lassieur.org> writes:
> 
> > Leo, if you confirm my analysis, could you please unmerge the bugs?
> 
> I did it, because Eric confirmed the fix.

Thanks, sorry for the confusion!
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Mon, 23 Jul 2018 17:09:02 GMT) Full text and rfc822 format available.

Message #55 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Martin Castillo <castilma <at> uni-bremen.de>
To: Clément Lassieur <clement <at> lassieur.org>,
 Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org, Julien Lepiller <julien <at> lepiller.eu>,
 32197 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Mon, 23 Jul 2018 19:08:36 +0200
[Message part 1 (text/plain, inline)]
On 19.07.2018 15:15, Clément Lassieur wrote:
> [...]
> Martin, could you please test it?  (You just need to 'guix pull' and try
> again.)  Leo, if you confirm my analysis, could you please unmerge the
> bugs?
>
> Thanks,
> Clément
>

It still does not work for me.

Attached are my guix version, dmesg|grep shepherd output, config.scm and
my qemu invocation (metal).

I run that system in qemu, but it is installed directly on my harddrive.

Martin

-- 
GPG: 7FDE 7190 2F73 2C50 236E  403D CC13 48F1 E644 08EC
[config.scm (text/x-scheme, attachment)]
[shepherd.dmesg (text/plain, attachment)]
[version (text/plain, attachment)]
[metal (text/plain, attachment)]
[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Tue, 28 Aug 2018 09:48:02 GMT) Full text and rfc822 format available.

Message #58 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Tue, 28 Aug 2018 11:47:12 +0200
Hi Leo,

Leo Famulari <leo <at> famulari.name> skribis:

> Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
> killed almost immediately after it starts with signal 15. I confirmed
> the issue started with the Shepherd upgrade by bisecting our Git
> history.
>
> I can reproduce the issue from commit
> b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).

I’m “happy” to say that I experienced this on a server—not having ssh
access to a remote server is fairly annoying, I definitely sympathize…

What I see is this:

--8<---------------cut here---------------start------------->8---
Aug  6 07:56:40 localhost shepherd[1]: Service loopback has been started. 

[...]

Aug  6 07:56:51 localhost sshd[606]: Server listening on 0.0.0.0 port 22.

[...]

Aug  6 07:57:05 localhost shepherd[1]: Service ssh-daemon could not be started. 

[...]

Aug  6 07:57:46 localhost vmunix: [   10.049791] random: ssh-keygen: uninitialized urandom read (32 bytes read)
--8<---------------cut here---------------end--------------->8---

(Note that the last message was pulled from /dev/kmsg by syslogd, but
it’s about an event that actually occurred before the first message.)

It waited for ~15 seconds, although ‘%pid-file-timeout’ in (shepherd
service) is only 5 seconds.

The SIGTERM you were seeing very likely comes from this bit:

--8<---------------cut here---------------start------------->8---
      (match (read-pid-file pid-file
                            #:max-delay pid-file-timeout)
        (#f
         (catch-system-error (kill pid SIGTERM))
         #f)
        ((? integer? pid)
         pid))
--8<---------------cut here---------------end--------------->8---

On another machine:

--8<---------------cut here---------------start------------->8---
Aug 28 09:10:49 localhost sshd[435]: Server listening on 0.0.0.0 port 22.
Aug 28 09:10:49 localhost sshd[435]: Server listening on :: port 22.

[...]

Aug 28 09:10:50 localhost shepherd[1]: Service ssh-daemon has been started. 
--8<---------------cut here---------------end--------------->8---

I wonder if this has to do with IPv6 (the failing case lacks the “Server
listening on ::” line), or if it’s just sshd occasionally taking a long
time to start.

Is it easily reproducible for you?  Did you eventually gather more
details?

Thanks,
Ludo’.




Merged 30993 33299. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Wed, 07 Nov 2018 21:35:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Wed, 07 Nov 2018 21:37:02 GMT) Full text and rfc822 format available.

Message #63 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: swedebugia <swedebugia <at> riseup.net>
Cc: 33299 <at> debbugs.gnu.org, 30993 <at> debbugs.gnu.org
Subject: Re: bug#33299: shepherd: Service ssh-daemon could not be started.
Date: Wed, 07 Nov 2018 22:36:33 +0100
Hello,

swedebugia <swedebugia <at> riseup.net> skribis:

> This morning I started my GuixSD VM as usual.
>
> Openssh server was not running even though it was enabled and should
> have been respawned. Hmm.

This seems to be the same issue as described in
<https://issues.guix.info/issue/30993> (I’ve now merged both).

It’s a serious problem…

Thanks for reporting it,
Ludo’.





Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Sat, 17 Nov 2018 09:47:02 GMT) Full text and rfc822 format available.

Message #66 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Taegil Bae <esrevinu <at> gmail.com>
To: 30993 <at> debbugs.gnu.org
Subject: issue: ssh-daemon could not be started
Date: Sat, 17 Nov 2018 18:46:26 +0900
Hi,

I have experienced this issue on a slow machine (Thinkpad T60). By
placing avahi-daemon after ssh-daemon, I fixed this issue. I added
ssh-daemon into the requirement of avahi-shepherd-service, and
reconfigured the system.

On the machine, openssh and dropbear both had this issue. I checked
that this trick works for the two ssh server. On another machine(Qemu
VM, faster), openssh only had the trouble. Through several
reconfiguring the system, it was fixed with the original
configuration. I think that it was because the reconfigurations
reordered the services.

Taegil




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Mon, 19 Nov 2018 21:23:02 GMT) Full text and rfc822 format available.

Message #69 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Taegil Bae <esrevinu <at> gmail.com>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: issue: ssh-daemon could not be started
Date: Mon, 19 Nov 2018 22:22:03 +0100
Hello,

Taegil Bae <esrevinu <at> gmail.com> skribis:

> I have experienced this issue on a slow machine (Thinkpad T60). By
> placing avahi-daemon after ssh-daemon, I fixed this issue. I added
> ssh-daemon into the requirement of avahi-shepherd-service, and
> reconfigured the system.

This is very surprising.  Are you sure this is fully reproducible?
How would you explain this?

Thank you,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Tue, 20 Nov 2018 01:34:02 GMT) Full text and rfc822 format available.

Message #72 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Taegil Bae <esrevinu <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: issue: ssh-daemon could not be started
Date: Tue, 20 Nov 2018 10:33:17 +0900
Hello,

On 11/20/18 6:22 AM, Ludovic Courtès wrote:
> Taegil Bae <esrevinu <at> gmail.com> skribis:
>
>> I have experienced this issue on a slow machine (Thinkpad T60). By
>> placing avahi-daemon after ssh-daemon, I fixed this issue. I added
>> ssh-daemon into the requirement of avahi-shepherd-service, and
>> reconfigured the system.
> This is very surprising.  Are you sure this is fully reproducible?
> How would you explain this?

At least in my machine it seems reproducible. I just reproduced that: 
reconfigured the system without my modification of avahi-daemon, checked 
ssh-daemon not started, reconfigured the system again with the 
modification, and then checked ssh-daemon started.

I am not a professional. But I guess that avahi-daemon manipulates 
network things such as the host name, and ssh-daemon waits for that to 
be completed. Look at this failing case:

Nov 20 09:37:57 localhost avahi-daemon[344]: Found user 'avahi' (UID 
985) and group 'avahi' (GID 973).
Nov 20 09:37:59 localhost avahi-daemon[344]: Successfully dropped root 
privileges.
Nov 20 09:38:00 localhost avahi-daemon[344]: avahi-daemon 0.7 starting up.
Nov 20 09:38:18 localhost shepherd[1]: Service avahi-daemon has been 
started.
Nov 20 09:38:01 localhost avahi-daemon[344]: WARNING: No NSS support for 
mDNS detected, consider installing nss-mdns!
Nov 20 09:38:05 localhost avahi-daemon[344]: Successfully called chroot().
Nov 20 09:38:12 localhost avahi-daemon[344]: Successfully dropped 
remaining capabilities.
Nov 20 09:38:32 localhost shepherd[1]: Service ssh-daemon could not be 
started.
Nov 20 09:38:15 localhost avahi-daemon[344]: Loading service file 
/services/sftp-ssh.service.
Nov 20 09:38:17 localhost avahi-daemon[344]: Loading service file 
/services/ssh.service.
Nov 20 09:38:19 localhost avahi-daemon[344]: Network interface 
enumeration completed.
Nov 20 09:38:22 localhost avahi-daemon[344]: Server startup complete. 
Host name is gravity.local. Local service cookie is 4157419020.
Nov 20 09:38:29 localhost avahi-daemon[344]: Service "gravity" 
(/services/ssh.service) successfully established.
Nov 20 09:38:32 localhost avahi-daemon[344]: Service "gravity" 
(/services/sftp-ssh.service) successfully established.
Nov 20 09:38:36 localhost avahi-daemon[344]: write() failed while 
writing return value to pipe: Broken pipe
Nov 20 09:39:12 localhost avahi-daemon[344]: Joining mDNS multicast 
group on interface wls3.IPv6 with address fe80::4c7d:9233:7845:eb88.
Nov 20 09:39:12 localhost avahi-daemon[344]: New relevant interface 
wls3.IPv6 for mDNS.
Nov 20 09:39:12 localhost avahi-daemon[344]: Registering new address 
record for fe80::4c7d:9233:7845:eb88 on wls3.*.
Nov 20 09:39:12 localhost avahi-daemon[344]: Joining mDNS multicast 
group on interface wls3.IPv4 with address 192.168.42.242.
Nov 20 09:39:12 localhost avahi-daemon[344]: New relevant interface 
wls3.IPv4 for mDNS.
Nov 20 09:39:12 localhost avahi-daemon[344]: Registering new address 
record for 192.168.42.242 on wls3.IPv4.

Regards,

Taegil





Merged 30993 33299 34580. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 09 Mar 2019 18:37:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#30993; Package guix. (Tue, 14 May 2019 13:35:02 GMT) Full text and rfc822 format available.

Message #77 received at 30993 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: 30993 <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Tue, 14 May 2019 15:33:59 +0200
Hi Leo,

Leo Famulari <leo <at> famulari.name> skribis:

> One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
> created anymore. And if I create an empty PID file by hand, it is
> removed after trying to start the ssh-daemon service. Also, the sshd
> user's home '/var/run/sshd' does not exist, and is similarly removed if
> it does exist.

There are reasons to believe that this issue is fixed by the Shepherd 0.6.1:

  https://issues.guix.info/issue/35550

Could you check somehow if the bug shows up again?

Thanks,
Ludo’.




Reply sent to Leo Famulari <leo <at> famulari.name>:
You have taken responsibility. (Tue, 14 May 2019 18:22:02 GMT) Full text and rfc822 format available.

Notification sent to Leo Famulari <leo <at> famulari.name>:
bug acknowledged by developer. (Tue, 14 May 2019 18:22:02 GMT) Full text and rfc822 format available.

Message #82 received at 30993-done <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 30993-done <at> debbugs.gnu.org
Subject: Re: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
Date: Tue, 14 May 2019 14:21:06 -0400
[Message part 1 (text/plain, inline)]
On Tue, May 14, 2019 at 03:33:59PM +0200, Ludovic Courtès wrote:
> There are reasons to believe that this issue is fixed by the Shepherd 0.6.1:
> 
>   https://issues.guix.info/issue/35550
> 
> Could you check somehow if the bug shows up again?

The bug disappeared for me a couple of reboots after reinstalling the
Guix System on my affected machine. That reinstallation provided
Shepherd 0.5, although 0.5 was also tested unsuccessfully before
reinstalling.

The issue does not manifest for me with Shepherd 0.6, 0.6.1 or Guix 1.0.

So... I think the bug was probably some kind of race condition or TOCTOU
problem that went away with a less fragmented or full filesystem (I was
really pushing the limits in that regard).

It's great that Shepherd 0.6.1 improved the PID file handling, assuming
that was the culprit.

I am closing this bug, but we can reopen it later if necessary.
[signature.asc (application/pgp-signature, inline)]

Reply sent to Leo Famulari <leo <at> famulari.name>:
You have taken responsibility. (Tue, 14 May 2019 18:22:02 GMT) Full text and rfc822 format available.

Notification sent to swedebugia <swedebugia <at> riseup.net>:
bug acknowledged by developer. (Tue, 14 May 2019 18:22:02 GMT) Full text and rfc822 format available.

Reply sent to Leo Famulari <leo <at> famulari.name>:
You have taken responsibility. (Tue, 14 May 2019 18:22:02 GMT) Full text and rfc822 format available.

Notification sent to Andreas Enge <andreas <at> enge.fr>:
bug acknowledged by developer. (Tue, 14 May 2019 18:22:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 12 Jun 2019 11:24:07 GMT) Full text and rfc822 format available.

bug unarchived. Request was from "Brant Gardner" <mail <at> lnklnx.com> to control <at> debbugs.gnu.org. (Fri, 20 Sep 2019 19:16:01 GMT) Full text and rfc822 format available.

Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:29:01 GMT) Full text and rfc822 format available.

Merged 30993 33299 34580 37309. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:29:03 GMT) Full text and rfc822 format available.

Changed bug title to ''ssh-daemon' fails to start' from 'OpenSSH sshd killed by Shepherd 0.4.0' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:29:04 GMT) Full text and rfc822 format available.

Added tag(s) fixed. Request was from maxim.cournoyer <at> gmail.com to control <at> debbugs.gnu.org. (Tue, 18 Aug 2020 04:09:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 30993 <at> debbugs.gnu.org and Leo Famulari <leo <at> famulari.name> Request was from maxim.cournoyer <at> gmail.com to control <at> debbugs.gnu.org. (Tue, 18 Aug 2020 04:09:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 15 Sep 2020 11:24:04 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Christopher Lemmer Webber <cwebber <at> dustycloud.org> to control <at> debbugs.gnu.org. (Fri, 27 Nov 2020 22:59:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 01 Jan 2021 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 86 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.