GNU bug report logs - #35550
Installer: wpa_supplicant fails to start

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Fri, 3 May 2019 19:32:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35550 in the body.
You can then email your comments to 35550 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Fri, 03 May 2019 19:32:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Fri, 03 May 2019 19:32:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Bug Guix <bug-guix <at> gnu.org>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, sirgazil <sirgazil <at> zoho.com>
Subject: Installer: wpa_supplicant fails to start
Date: Fri, 03 May 2019 21:31:40 +0200
Hello,

Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would
(sometimes?) fail to start for the in the installation image on their
laptop.

As a result the ‘networking’ service (Connman) would not start either;
thus, the installer would report that it failed to establish Internet
access, and would just stop there.

Sirgazil posted /var/log/messages, which shows this:

--8<---------------cut here---------------start------------->8---
667 May  3 16:03:40 localhost vmunix: [    9.164363] ieee80211 phy0: Atheros AR9565 Rev:2 mem=0xffffa88240e80000, irq=19
668 May  3 16:03:40 localhost vmunix: [    9.172514] ath9k 0000:02:00.0 wlp2s0: renamed from wlan0
669 May  3 16:03:40 localhost vmunix: [    9.199748] shepherd[1]: Service udev has been started.
670 May  3 16:03:40 localhost vmunix: [    9.200682] shepherd[1]: Service file-system-/tmp has been started.
671 May  3 16:03:40 localhost vmunix: [    9.201560] shepherd[1]: Service file-system-/dev/pts has been started.
672 May  3 16:03:40 localhost vmunix: [    9.202423] shepherd[1]: Service file-system-/dev/shm has been started.
673 May  3 16:03:40 localhost vmunix: [    9.203232] shepherd[1]: Service file-system-/gnu/store has been started.
674 May  3 16:03:40 localhost vmunix: [    9.203891] shepherd[1]: Service file-systems has been started.
675 May  3 16:03:40 localhost vmunix: [    9.207605] shepherd[1]: Service user-processes has been started.
676 May  3 16:03:40 localhost vmunix: [    9.208993] shepherd[1]: Service host-name has been started.
677 May  3 16:03:40 localhost vmunix: [    9.217925] shepherd[1]: Service user-homes has been started.
678 May  3 16:03:40 localhost vmunix: [   10.221211] shepherd[1]: Service dbus-system has been started.
679 May  3 16:03:40 localhost vmunix: [   10.222093] shepherd[1]: Service loopback has been started.
680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.
687 May  3 16:03:45 localhost nscd: 401 monitoring file `/etc/hosts` (1)
688 May  3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)
689 May  3 16:03:45 localhost nscd: 401 monitoring file `/etc/resolv.conf` (3)
690 May  3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)
691 May  3 16:03:46 localhost shepherd[1]: Service nscd has been started.
692 May  3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: *** info [daemon/startup.c(136)]:
693 May  3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: Started gpm successfully. Entered daemon mode.
694 May  3 16:03:46 localhost shepherd[1]: Service gpm has been started.
695 May  3 16:03:46 localhost shepherd[1]: Service term-tty1 has been started.
696 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty1 has been started.
697 May  3 16:03:46 localhost shepherd[1]: Service virtual-terminal has been started.
698 May  3 16:03:46 localhost shepherd[1]: Service term-tty2 has been started.
699 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty2 has been started.
700 May  3 16:03:46 localhost shepherd[1]: Service term-tty3 has been started.
701 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty3 has been started.
702 May  3 16:03:46 localhost shepherd[1]: Service term-tty4 has been started.
703 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty4 has been started.
704 May  3 16:03:46 localhost shepherd[1]: Service term-tty5 has been started.
705 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty5 has been started.
706 May  3 16:03:46 localhost shepherd[1]: Service term-tty6 has been started.
707 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty6 has been started.
708 May  3 16:03:46 localhost shepherd[1]: Service guix-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Fri, 03 May 2019 20:52:01 GMT) Full text and rfc822 format available.

Message #8 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 35550 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>, sirgazil <sirgazil <at> zoho.com>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Fri, 03 May 2019 22:51:31 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> 678 May  3 16:03:40 localhost vmunix: [   10.221211] shepherd[1]: Service dbus-system has been started.
> 679 May  3 16:03:40 localhost vmunix: [   10.222093] shepherd[1]: Service loopback has been started.
> 680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
> 681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
> 682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
> 683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
> 684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
> 685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
> 686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.

My guess is that it goes like this:

  1. shepherd starts ‘networking’, which triggers the start of
     ‘wpa-supplicant’ (PID 398), which immediately “fails”.  Thus
     ‘networking’ is not started.

  2. shepherd continues and starts ‘wpa-supplicant’ directly (PID 400).
     This time it fails for good; after 5 seconds, since the PID file
     didn’t show up, shepherd says again that it could not be started.

Indeed, by looking at shepherd.conf from:

  guix gc -R $(guix system build gnu/system/install.scm) | grep shepherd.conf

one can see that ‘networking’ comes before ‘wpa-supplicant’ in the expression:

  (for-each start '(… networking … wpa-supplicant …))

So why is ‘wpa-supplicant’ marked as failing to start on the first
attempt?

The only reason I can think of is if ‘read-pid-file’ from (shepherd
service) returns immediately and returns #f instead of a number.  That
can actually happen if the PID file exists but is empty (or contains
garbage).

You would expect wpa_supplicant to create its PID file atomically: write
it under a different name, then rename(2)… but no:

--8<---------------cut here---------------start------------->8---
int os_daemonize(const char *pid_file)
{
#if defined(__uClinux__) || defined(__sun__)
	return -1;
#else /* defined(__uClinux__) || defined(__sun__) */
	if (os_daemon(0, 0)) {
		perror("daemon");
		return -1;
	}

	if (pid_file) {
		FILE *f = fopen(pid_file, "w");
		if (f) {
			fprintf(f, "%u\n", getpid());
			fclose(f);
		}
	}

	return -0;
#endif /* defined(__uClinux__) || defined(__sun__) */
}
--8<---------------cut here---------------end--------------->8---

So there is a possibility, albeit unlikely, for shepherd to see the PID
file after it’s been open but before it’s been written to.  (This
problem is not limited to the installer.)

I’m not 100% convinced that this is what’s happening there but that’s
the only lead I have.  I’m surprised we haven’t seen other reports
before.

Thoughts?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Fri, 03 May 2019 21:28:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <sirgazil <at> zoho.com>, Bug Guix <bug-guix <at> gnu.org>
Subject: Re: Installer: wpa_supplicant fails to start
Date: Fri, 03 May 2019 23:26:52 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would
> (sometimes?) fail to start for the in the installation image on their
> laptop.

What I observed was that the wpa-supplicant *service* would not start
and starting it manually would fail because a wpa_supplicant *process*
would already be running.

Upon killing the stray process the service could be started.

-- 
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 03:28:02 GMT) Full text and rfc822 format available.

Message #14 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: sirgazil <at> zoho.com
To: "35550" <35550 <at> debbugs.gnu.org>
Subject: Installer: wpa_supplicant fails to start
Date: Fri, 03 May 2019 22:26:48 -0500
[Message part 1 (text/plain, inline)]
In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).

Killing "wpa-supplicant" and then running "herd start networking" made it possible for the graphical installer to display the steps where you select Wireless or Wired, and then continue normally.
[Message part 2 (text/html, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 04 May 2019 12:44:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 12:51:01 GMT) Full text and rfc822 format available.

Message #19 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: sirgazil <at> zoho.com
Cc: Ricardo Wurmus <rekado <at> elephly.net>, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 04 May 2019 14:49:55 +0200
Hi,

sirgazil <at> zoho.com skribis:

> In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).

Could you check whether the sequence of events in /var/log/messages
always looks like the snippet I showed?

Could you also:

  1. Let the installation image boot;

  2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
     service is marked as stopped (failed to start);

  3. Show the contents /var/run/wpa_supplicant.pid, and check whether
     they correspond to the PID of a running ‘wpa_supplicant’ process.

?

Besides, do you have a Guix(SD) installation, recent or not, that uses
wpa_supplicant?  If so, does it suffer from the same problem
(‘wpa-supplicant’ service fails to start)?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 14:09:02 GMT) Full text and rfc822 format available.

Message #22 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 4 May 2019 16:07:54 +0200
On Sat, May 04, 2019 at 02:49:55PM +0200, Ludovic Courtès wrote:
> Hi,
> 
> sirgazil <at> zoho.com skribis:
> 
> > In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).
> 
> Could you check whether the sequence of events in /var/log/messages
> always looks like the snippet I showed?
> 

So for me it sometimes fails on two different machines.  Actually it
started failing four times in a row now that I test this.

When it fails, I also get

On Fri, May 03, 2019 at 09:31:40PM +0200, Ludovic Courtès wrote:
> 680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
> 681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
> 682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
> 683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
> 684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
> 685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
> 686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.



> Could you also:
> 
>   1. Let the installation image boot;
> 
>   2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
>      service is marked as stopped (failed to start);
>

$ herd status wpa-supplicant
Status of wpa-supplicant:
  It is stopped.
  It is enabled.
  Provides (wpa-supplicant).
  Requires (user-processes dbus-system loopback syslogd).
  Conflicts with ().
  Will be respawned.



>   3. Show the contents /var/run/wpa_supplicant.pid, and check whether
>      they correspond to the PID of a running ‘wpa_supplicant’ process.
>

There is no such file.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 15:29:01 GMT) Full text and rfc822 format available.

Message #25 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 35550 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>, sirgazil <sirgazil <at> zoho.com>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 04 May 2019 17:28:04 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> You would expect wpa_supplicant to create its PID file atomically: write
> it under a different name, then rename(2)… but no:

Shepherd commit 72631752149d000c8c98ae0cc66e0b0c2eda94ef changes
‘read-pid-file’ to better deal with this.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 15:33:02 GMT) Full text and rfc822 format available.

Message #28 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 04 May 2019 17:32:23 +0200
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

>> Could you also:
>> 
>>   1. Let the installation image boot;
>> 
>>   2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
>>      service is marked as stopped (failed to start);
>>
>
> $ herd status wpa-supplicant
> Status of wpa-supplicant:
>   It is stopped.
>   It is enabled.
>   Provides (wpa-supplicant).
>   Requires (user-processes dbus-system loopback syslogd).
>   Conflicts with ().
>   Will be respawned.
>
>
>
>>   3. Show the contents /var/run/wpa_supplicant.pid, and check whether
>>      they correspond to the PID of a running ‘wpa_supplicant’ process.
>>
>
> There is no such file.

At this point, could you also check:

  3b. The output of “pgrep -fa wpa_supplicant”.

The PID file was probably removed by the second attempt to start the
‘wpa-supplicant’ service.

Could you also try running (as root):

  while herd restart wpa-supplicant; do : ; done

to see the frequency at which the service fails to start?

Thanks!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 15:48:02 GMT) Full text and rfc822 format available.

Message #31 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 4 May 2019 17:47:44 +0200
On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
> At this point, could you also check:
> 
>   3b. The output of “pgrep -fa wpa_supplicant”.
> 

root <at> gnu ~# pgrep -fa wpa_supplicant
481 /gnu/store/ifqy2dr2hbqplv14k50dzprw3k2bz9m9-wpa-supplicant-2.7/sbin/wpa_supplicant -P/var/run/wpa_supplicant.pid -B -s -u


> The PID file was probably removed by the second attempt to start the
> ‘wpa-supplicant’ service.
> 
> Could you also try running (as root):
> 
>   while herd restart wpa-supplicant; do : ; done
> 
> to see the frequency at which the service fails to start?
>

It displays

Service wpa-supplicant is not running.
Service wpa-supplicant could not be started.

after and again and again every approximately 5 seconds.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 21:15:02 GMT) Full text and rfc822 format available.

Message #34 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 04 May 2019 23:14:29 +0200
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:

[...]

>> Could you also try running (as root):
>> 
>>   while herd restart wpa-supplicant; do : ; done
>> 
>> to see the frequency at which the service fails to start?
>>
>
> It displays
>
> Service wpa-supplicant is not running.
> Service wpa-supplicant could not be started.

Sorry, to be complete, I should have asked you to first kill the
“dangling” wpa_supplicant process:

  pkill -f wpa_supplicant
  pgrep -fa wpa_supplicant  # make sure it’s really dead
  while herd restart wpa-supplicant; do : ; done

Could you try this?

TIA,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sat, 04 May 2019 21:35:01 GMT) Full text and rfc822 format available.

Message #37 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 4 May 2019 23:34:03 +0200
On Sat, May 04, 2019 at 11:14:29PM +0200, Ludovic Courtès wrote:
> Hi,
> 
> "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:
> 
> > On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
> 
> [...]
> 
> >> Could you also try running (as root):
> >> 
> >>   while herd restart wpa-supplicant; do : ; done
> >> 
> >> to see the frequency at which the service fails to start?
> >>
> >
> > It displays
> >
> > Service wpa-supplicant is not running.
> > Service wpa-supplicant could not be started.
> 
> Sorry, to be complete, I should have asked you to first kill the
> “dangling” wpa_supplicant process:
> 
>   pkill -f wpa_supplicant
>   pgrep -fa wpa_supplicant  # make sure it’s really dead

pgrep does not display anything.


>   while herd restart wpa-supplicant; do : ; done
> 
> Could you try this?
> 

Now it displays

Service wpa-supplicant has been stopped.
Service wpa-supplicant has been started.

again and again incessantly, one immediately after the other with no
time in between.

By the way, the network connection only failed sometimes when I tested
Guix 1.0.0, but since I started using a Guix disk image from a more
recent Guix, wpa-supplicant apparently always fails.  Strange.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sun, 05 May 2019 16:30:01 GMT) Full text and rfc822 format available.

Message #40 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: sirgazil <at> zoho.com, 35550 <35550 <at> debbugs.gnu.org>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sun, 05 May 2019 18:29:46 +0200
Hello,

I’ve been able to test it on the bare metal as well and I confirm: I
booted the install image a few times and the ‘wpa-supplicant’ service
would be stopped every time, with a dangling ‘wpa_supplicant’ process
running.  Given that this seems to be systematic, I’m not sure my PID
file atomicity hypothesis explains everything.

I’ve tried running:

  pkill wpa_supplicant
  while herd start wpa-supplicant ; do herd stop wpa-supplicant ; done

and that always works (i.e., no “failed to start ‘wpa-supplicant’”
error.)

If I run in parallel this:

  while touch /var/run/wpa_supplicant.pid ; do : ; done

then it occasionally “fails to start”, but not very often.

To be continued…

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sun, 05 May 2019 20:27:01 GMT) Full text and rfc822 format available.

Message #43 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 35550 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>, sirgazil <sirgazil <at> zoho.com>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sun, 05 May 2019 22:26:01 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> You would expect wpa_supplicant to create its PID file atomically: write
> it under a different name, then rename(2)… but no:

Regarding this PID file atomicity issue, a few other data points: Tor,
syslogd (GNU Inetutils), sshd (OpenSSH), and dbus-daemon all do the
exact same non-atomic PID file creation as wpa_supplicant.

Libdaemon (used by Avahi) does something more fancy: it acquires an
exclusive write lock (fcntl with F_SETLKW) on the PID file right after
opening it, but since shepherd does not attempt to acquire locks on PID
file, it has no effect.

systemd and GNU Pies don’t take any precautions while reading PID
files.  The assumption seems to be that if the file exists, then it must
contain a valid PID.

So perhaps the problem we see with ‘wpa_supplicant’ is elsewhere.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Sun, 05 May 2019 22:22:02 GMT) Full text and rfc822 format available.

Message #46 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 35550 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>, sirgazil <sirgazil <at> zoho.com>
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Mon, 06 May 2019 00:21:26 +0200
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> So why is ‘wpa-supplicant’ marked as failing to start on the first
> attempt?
>
> The only reason I can think of is if ‘read-pid-file’ from (shepherd
> service) returns immediately and returns #f instead of a number.  That
> can actually happen if the PID file exists but is empty (or contains
> garbage).

I’ve produced an ISO with the patch below and ran it on the bare metal
to get confirmation (too bad the bug doesn’t show in QEMU :-/).  Indeed,
‘read-pid-file’ for /var/run/wpa_supplicant.pid systematically reads the
empty string the first time the ‘wpa-supplicant’ service is started.

(After that, if we kill the process and try to restart the service, the
problem doesn’t show up.)

[Message part 2 (text/x-patch, inline)]
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 53437b6..e21492e 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -717,9 +717,12 @@ otherwise return the number that was read (a PID)."
   (let loop ()
     (catch 'system-error
       (lambda ()
-        (string->number
-         (string-trim-both
-          (call-with-input-file file get-string-all))))
+        (define str
+          (call-with-input-file file get-string-all))
+
+        (local-output (l10n "read-pid-file ~s -> ~s")
+                      file str)
+        (string->number (string-trim-both str)))
       (lambda args
         (let ((errno (system-error-errno args)))
           (if (= ENOENT errno)
[Message part 3 (text/plain, inline)]
With the second patch below, I confirm that the ‘wpa-supplicant’ starts
correctly.  We can see in /var/log/messages that ‘read-pid-file’ first
reads the empty string from /var/run/wpa_supplicant.pid, then tries
again, and gets a valid PID on the second attempt.

It’s surprising that the timing is always like that, and only on the
bare metal, but that’s the way it is.

It’d be great if you could do some testing with the patch below.  Then I
guess we’ll push a Shepherd release with this fix.

I wonder if this could also explain
<https://issues.guix.info/issue/30993>.

Thanks,
Ludo’.

[Message part 4 (text/x-patch, inline)]
diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index dfc3467bf8..e1dd248679 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -188,7 +188,8 @@ and provides a \"top-like\" mode (monitoring).")
                                   version ".tar.gz"))
               (sha256
                (base32
-                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))
+                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))
+              (patches (search-patches "shepherd-debug.patch"))))
     (build-system gnu-build-system)
     (arguments
      '(#:configure-flags '("--localstatedir=/var")))
diff --git a/gnu/packages/patches/shepherd-debug.patch b/gnu/packages/patches/shepherd-debug.patch
new file mode 100644
index 0000000000..2fd97cc578
--- /dev/null
+++ b/gnu/packages/patches/shepherd-debug.patch
@@ -0,0 +1,43 @@
+diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
+index 53437b6..bef8f42 100644
+--- a/modules/shepherd/service.scm
++++ b/modules/shepherd/service.scm
+@@ -715,21 +715,28 @@ number.  Return #f if FILE was not created or does not contain a number;
+ otherwise return the number that was read (a PID)."
+   (define start (current-time))
+   (let loop ()
++    (define (retry)
++      (and (< (current-time) (+ start max-delay))
++           (begin
++             ;; FILE does not exist yet, so wait and try again.
++             ;; XXX: Ideally we would yield to the main event loop
++             ;; and/or use inotify.
++             (sleep 1)
++             (loop))))
++
+     (catch 'system-error
+       (lambda ()
+-        (string->number
+-         (string-trim-both
+-          (call-with-input-file file get-string-all))))
++        (define str
++          (call-with-input-file file get-string-all))
++
++        (local-output (l10n "read-pid-file ~s -> ~s")
++                      file str)
++        (or (string->number (string-trim-both str))
++            (retry)))
+       (lambda args
+         (let ((errno (system-error-errno args)))
+           (if (= ENOENT errno)
+-              (and (< (current-time) (+ start max-delay))
+-                   (begin
+-                     ;; FILE does not exist yet, so wait and try again.
+-                     ;; XXX: Ideally we would yield to the main event loop
+-                     ;; and/or use inotify.
+-                     (sleep 1)
+-                     (loop)))
++              (retry)
+               (apply throw args)))))))
+ 
+ (define* (exec-command command

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Mon, 06 May 2019 19:48:01 GMT) Full text and rfc822 format available.

Message #49 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Mon, 6 May 2019 21:47:41 +0200
On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:
> It’d be great if you could do some testing with the patch below.  Then I
> guess we’ll push a Shepherd release with this fix.
> 

The second patch (the one for Guix is the only one I have tested)
works fine on the system which always failed before since sometime
after 1.0.

One time wpa-supplicant worked fine but it had an error establishing
the internet connection, but I am quite certain it was an unrelated
hardware issue.

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Mon, 06 May 2019 20:01:02 GMT) Full text and rfc822 format available.

Message #52 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Danny Milosavljevic <dannym <at> scratchpost.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Mon, 6 May 2019 22:00:30 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

what happens when the loop reads the pid file when it contains just half of a
numeral?  It won't detect it, right?
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Tue, 07 May 2019 07:55:02 GMT) Full text and rfc822 format available.

Message #55 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Tue, 07 May 2019 09:53:54 +0200
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:
>> It’d be great if you could do some testing with the patch below.  Then I
>> guess we’ll push a Shepherd release with this fix.
>> 
>
> The second patch (the one for Guix is the only one I have tested)
> works fine on the system which always failed before since sometime
> after 1.0.
>
> One time wpa-supplicant worked fine but it had an error establishing
> the internet connection, but I am quite certain it was an unrelated
> hardware issue.

Awesome, thanks for testing.

Note that I made similar changes in shepherd, which I’ll probably
release as 0.6.1 so we can use it in Guix 1.0.1:

  https://git.savannah.gnu.org/cgit/shepherd.git/log

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Tue, 07 May 2019 08:06:02 GMT) Full text and rfc822 format available.

Message #58 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Tue, 07 May 2019 10:05:08 +0200
[Message part 1 (text/plain, inline)]
Hi Danny,

Danny Milosavljevic <dannym <at> scratchpost.org> skribis:

> what happens when the loop reads the pid file when it contains just half of a
> numeral?  It won't detect it, right?

Correct.

I’m proposing the addition below to be on the verrrry safe side.
WDYT?

The weird thing, as I mentioned earlier, is that systemd and Pies do not
protect against truncated PID files, and I couldn’t find any
“documentation” of the problem on the intertubes.  For systemd it’s
maybe less of a problem since services are started in a cgroup, I think.

Thanks,
Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index b937609..ef27d88 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -709,10 +709,16 @@ results."
 set when starting a service."
   (environ))
 
-(define* (read-pid-file file #:key (max-delay 5))
+(define* (read-pid-file file #:key (max-delay 5)
+                        (validate-pid? #f))
   "Wait for MAX-DELAY seconds for FILE to show up, and read its content as a
 number.  Return #f if FILE was not created or does not contain a number;
-otherwise return the number that was read (a PID)."
+otherwise return the number that was read (a PID).
+
+When VALIDATE-PID? is true, succeed if and only if the number that was read is
+the PID of an existing process in the current PID namespace.  This test cannot
+be used if FILE might contain a PID from another PID namespace (i.e., the
+daemon writing FILE is running in a separate PID namespace.)"
   (define start (current-time))
 
   (let loop ()
@@ -736,11 +742,13 @@ otherwise return the number that was read (a PID)."
            (try-again))
           ((? integer? pid)
            ;; It's possible, though unlikely, that PID is not a valid PID, for
-           ;; instance because writes to FILE did not complete.  However, we
-           ;; don't do (kill pid 0) because if the process lives in a separate
-           ;; PID namespace, then PID is probably invalid in our own
-           ;; namespace.
-           pid)))
+           ;; instance because writes to FILE did not complete.  When
+           ;; VALIDATE-PID? is true, check that PID is valid in the current
+           ;; PID namespace.
+           (if (or (not validate-pid?)
+                   (catch-system-error (kill pid 0) #t))
+               pid
+               (try-again)))))
       (lambda args
         (let ((errno (system-error-errno args)))
           (if (= ENOENT errno)
@@ -931,7 +939,8 @@ start."
                                         environment-variables)))
             (if pid-file
                 (match (read-pid-file pid-file
-                                      #:max-delay pid-file-timeout)
+                                      #:max-delay pid-file-timeout
+                                      #:validate-pid? #t)
                   (#f
                    (catch-system-error (kill pid SIGTERM))
                    #f)

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Wed, 08 May 2019 11:34:02 GMT) Full text and rfc822 format available.

Message #61 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
Cc: guix-devel <at> gnu.org, 35630 <at> debbugs.gnu.org, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35630: graphical install,
 no network device found (HP EliteDesk 800 G1)
Date: Wed, 08 May 2019 13:32:59 +0200
[Message part 1 (text/plain, inline)]
Hi Florian,

thank you for pointing me to the right bug report!

I checked the bug reports list via web [1] but did not link
wpa-supplicant to my issue since I did not realized that such an issue
also affects wired connections

"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> writes:

> I hope this is an instance of <https://issues.guix.info/issue/35550>,
> that is “herd status wpa-supplicant” shows that wpa-supplicant is
> stopped instead of running?  I hope there will be a new release soon
> where this is fixed.

yes, this is definitely an instance of #35550

I confirm I see the very same wpa_supplicant log messages in
/var/log/messages as reported by Sirgazil and pelzflorian

the following suggested workaround (in a text console activated via
CTRL-ALT-F3) works for me:

--8<---------------cut here---------------start------------->8---
pkill wpa_supplicant
herd start networking
--8<---------------cut here---------------end--------------->8---

after that I was able to restart the "Network interface" installer step
(by choosing "Exit" and selecting it again in the proposed dialog) and
was able to select the wired interface

Thanks! Gio'


[1] I'm going to subscribe bug-guix :-)

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Wed, 08 May 2019 14:33:02 GMT) Full text and rfc822 format available.

Message #64 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Wed, 08 May 2019 16:32:00 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> I’m proposing the addition below to be on the verrrry safe side.

Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.

Unless something goes wrong, I’ll make Shepherd commit
fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
days (well, with an updated NEWS file.)

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Wed, 08 May 2019 22:27:02 GMT) Full text and rfc822 format available.

Message #67 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Thu, 09 May 2019 00:25:59 +0200
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> I’m proposing the addition below to be on the verrrry safe side.
>
> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.
>
> Unless something goes wrong, I’ll make Shepherd commit
> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
> days (well, with an updated NEWS file.)

If you want to re-test that wpa_supplicant does start correctly with the
latest Shepherd, you can do so by applying the patch below to Guix.

Ludo’.

[Message part 2 (text/x-org, inline)]
diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index dfc3467bf8..6cabea0fb9 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -181,22 +181,30 @@ and provides a \"top-like\" mode (monitoring).")
 (define-public shepherd
   (package
     (name "shepherd")
-    (version "0.6.0")
+    (version "0.6.1pre1")
     (source (origin
-              (method url-fetch)
-              (uri (string-append "mirror://gnu/shepherd/shepherd-"
-                                  version ".tar.gz"))
+              (method git-fetch)
+              (uri (git-reference
+                    (url "https://git.savannah.gnu.org/git/shepherd.git")
+                    (commit "fbb9c3fac745552eaf0f354bd0134cca7027bf17")))
               (sha256
                (base32
-                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))
+                "0w7dx2i04qgykzhz4qir3rxnrdfza3q7kzh5z4mpmgjrp08pfnrn"))))
     (build-system gnu-build-system)
     (arguments
-     '(#:configure-flags '("--localstatedir=/var")))
+     '(#:configure-flags '("--localstatedir=/var"
+                           "ac_cv_path_MSGMERGE=true")))
     (native-inputs
      `(("pkg-config" ,pkg-config)
 
        ;; This is the Guile we use as a cross-compiler...
-       ("guile" ,guile-2.2)))
+       ("guile" ,guile-2.2)
+
+       ("texinfo" ,texinfo)
+       ("help2man" ,help2man)
+       ("gettext" ,gnu-gettext)
+       ("autoconf" ,autoconf)
+       ("automake" ,automake)))
     (inputs
      ;; ... and this is the one that appears in shebangs when cross-compiling.
      `(("guile" ,guile-2.2)

Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Thu, 09 May 2019 06:47:02 GMT) Full text and rfc822 format available.

Message #70 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: "pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, sirgazil <sirgazil <at> zoho.com>,
 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Thu, 9 May 2019 08:46:13 +0200
On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Courtès wrote:
> If you want to re-test that wpa_supplicant does start correctly with the
> latest Shepherd, you can do so by applying the patch below to Guix.
> 

Works fine too on Macbook and PC (using a non-ISO disk image).

Regards,
Florian




Information forwarded to bug-guix <at> gnu.org:
bug#35550; Package guix. (Thu, 09 May 2019 10:19:02 GMT) Full text and rfc822 format available.

Message #73 received at 35550 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "pelzflorian \(Florian Pelz\)" <pelzflorian <at> pelzflorian.de>
Cc: Danny Milosavljevic <dannym <at> scratchpost.org>, sirgazil <sirgazil <at> zoho.com>,
 35550 <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Thu, 09 May 2019 12:18:12 +0200
"pelzflorian (Florian Pelz)" <pelzflorian <at> pelzflorian.de> skribis:

> On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Courtès wrote:
>> If you want to re-test that wpa_supplicant does start correctly with the
>> latest Shepherd, you can do so by applying the patch below to Guix.
>> 
>
> Works fine too on Macbook and PC (using a non-ISO disk image).

Great, thanks for testing!




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 11 May 2019 18:14:01 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Sat, 11 May 2019 18:14:02 GMT) Full text and rfc822 format available.

Message #78 received at 35550-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Danny Milosavljevic <dannym <at> scratchpost.org>
Cc: sirgazil <sirgazil <at> zoho.com>, 35550-done <at> debbugs.gnu.org
Subject: Re: bug#35550: Installer: wpa_supplicant fails to start
Date: Sat, 11 May 2019 20:13:25 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> I’m proposing the addition below to be on the verrrry safe side.
>
> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.
>
> Unless something goes wrong, I’ll make Shepherd commit
> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
> days (well, with an updated NEWS file.)

Commit 3b8699f9c2a9f88d44a5320398b4c968c432429d upgrades the Shepherd to
0.6.1, which fixes the PID file race condition we’ve been talking about.

Closing!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 09 Jun 2019 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 284 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.