GNU bug report logs - #24941
Early termination bug in grep 2.26

Previous Next

Package: grep;

Reported by: Gary Johnson <garyjohn <at> spocom.com>

Date: Mon, 14 Nov 2016 02:42:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 24941 in the body.
You can then email your comments to 24941 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Mon, 14 Nov 2016 02:42:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gary Johnson <garyjohn <at> spocom.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 14 Nov 2016 02:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gary Johnson <garyjohn <at> spocom.com>
To: bug-grep <at> gnu.org
Subject: Early termination bug in grep 2.26
Date: Sun, 13 Nov 2016 17:56:32 -0800
There was some recent discussion on the vim_dev list of a failure to
update a Vim package which was found to be due to an update of grep
from 2.25 to 2.26.  The details of the grep behavior are discussed
here:

https://www.linuxquestions.org/questions/slackware-14/pkgtools-grep-bug-in-slackware[64]-current-4175593054/

In short, it seems to be due to the "grep: /dev/null output
speedup" commit of 2016-05-01, af6af288eac28951b5eee1eaaf373e22b2193b7b.
When grep terminates early, it closes the pipe it's reading stdin
from, which terminates the program on the other side of that pipe
early, before that program has completed its task.

Oops.

Regards,
Gary





Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Mon, 14 Nov 2016 15:33:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Mon, 14 Nov 2016 07:31:36 -0800
On Sun, Nov 13, 2016 at 5:56 PM, Gary Johnson <garyjohn <at> spocom.com> wrote:
> There was some recent discussion on the vim_dev list of a failure to
> update a Vim package which was found to be due to an update of grep
> from 2.25 to 2.26.  The details of the grep behavior are discussed
> here:
>
> https://www.linuxquestions.org/questions/slackware-14/pkgtools-grep-bug-in-slackware[64]-current-4175593054/
>
> In short, it seems to be due to the "grep: /dev/null output
> speedup" commit of 2016-05-01, af6af288eac28951b5eee1eaaf373e22b2193b7b.
> When grep terminates early, it closes the pipe it's reading stdin
> from, which terminates the program on the other side of that pipe
> early, before that program has completed its task.
>
> Oops.

Thank you for the report.

Oops, indeed.

While I  see nothing in the POSIX specification
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html)
that requires it to read all input in such a case, I think the current
behavior violates least-surprise.

Here is an example to demonstrate. I believe that POSIX does not
dictate what this code prints:

grep-2.25:

  $ (seq 10000000; echo $? 1>&2) | grep . > /dev/null
  0

grep-2.26:

  $ (seq 10000000; echo $? 1>&2) | grep . > /dev/null
  141

Paul, what do you think about making your heuristic apply only for non-pipes?




Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Mon, 14 Nov 2016 15:33:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 08:18:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, bug-grep <at> gnu.org
Cc: Gary Johnson <garyjohn <at> spocom.com>, 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Tue, 15 Nov 2016 00:17:03 -0800
Jim Meyering wrote:
> Paul, what do you think about making your heuristic apply only for non-pipes?

I'm a bit dubious, as grep exits early for other reasons, and did so before the 
patch in question. For example, here:

seq 10000000000 | grep -q .

This is as of grep 2.19 (2014), due to a bug report where grep performed badly 
by not exiting early <https://bugs.gnu.org/17427>. Which suggests that we'll get 
bug reports in this area no matter what grep does....

Looking at other implementations, Solaris 11 grep is similar with -q. And 
FreeBSD-current grep exits early for this case:

seq 10000000000 | grep -f /dev/null

Come to think of it, perhaps GNU grep should do a similar optimization for -f 
/dev/null, if only to keep up with FreeBSD.

All the above being said, I am sympathetic to the bug report. Perhaps we can 
eliminate the optimization if there are no file arguments and only options in 
the set -EFivx are specified. Something like that anyway. The idea is to catch 
common cases in old (and really, broken) scripts, without hurting performance in 
general.




Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 08:18:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 19:36:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Gary Johnson <garyjohn <at> spocom.com>, bug-grep <at> gnu.org, 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Tue, 15 Nov 2016 11:35:15 -0800
On Tue, Nov 15, 2016 at 12:17 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> Paul, what do you think about making your heuristic apply only for
>> non-pipes?
>
> I'm a bit dubious, as grep exits early for other reasons, and did so before
> the patch in question. For example, here:
>
> seq 10000000000 | grep -q .
>
> This is as of grep 2.19 (2014), due to a bug report where grep performed
> badly by not exiting early <https://bugs.gnu.org/17427>. Which suggests that
> we'll get bug reports in this area no matter what grep does....

The issue of -q is separate, because anyone who has invoked grep with
-q has long been exposed to this behavior already. And behavior could
even be inferred based on the POSIX description. I think it is fine
for grep -q to terminate early in all cases.

My concern is when no "exit-early"-implying option (none of at least
-l, -q, -m) is specified, say within some script that has always
worked, yet grep-2.26 makes OP's example fail most surprisingly when
at a distance (i.e., the invocation of grep was hidden) someone
unwittingly redirects standard output to /dev/null.

> Looking at other implementations, Solaris 11 grep is similar with -q. And
> FreeBSD-current grep exits early for this case:
>
> seq 10000000000 | grep -f /dev/null
>
> Come to think of it, perhaps GNU grep should do a similar optimization for
> -f /dev/null, if only to keep up with FreeBSD.
>
> All the above being said, I am sympathetic to the bug report. Perhaps we can
> eliminate the optimization if there are no file arguments and only options
> in the set -EFivx are specified. Something like that anyway. The idea is to
> catch common cases in old (and really, broken) scripts, without hurting
> performance in general.

I suppose you mean in addition to the S_ISFIFO test? That sounds good.
We should retain the optimization when reading from stdin that is a
non-pipe.




Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 19:36:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 22:14:02 GMT) Full text and rfc822 format available.

Message #26 received at 24941 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Wed, 16 Nov 2016 07:13:14 +0900
On Tue, 15 Nov 2016 11:35:15 -0800
Jim Meyering <jim <at> meyering.net> wrote:

> I suppose you mean in addition to the S_ISFIFO test? That sounds good.
> We should retain the optimization when reading from stdin that is a
> non-pipe.

This can also happen in stdin.  If we redirect stdout to /dev/null,
grep-2.26 exits immediately and prompt is returned.

- grep-2.25

$ grep . >/dev/null
a  <<input a and newline
b  <<input b and newline
c  <<input c and newline

- grep-2.26

$ src/grep . >/dev/null
a  <<input a and newline
$





Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 23:14:02 GMT) Full text and rfc822 format available.

Message #29 received at 24941 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Tue, 15 Nov 2016 15:13:27 -0800
On Tue, Nov 15, 2016 at 2:13 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> On Tue, 15 Nov 2016 11:35:15 -0800
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> I suppose you mean in addition to the S_ISFIFO test? That sounds good.
>> We should retain the optimization when reading from stdin that is a
>> non-pipe.
>
> This can also happen in stdin.  If we redirect stdout to /dev/null,
> grep-2.26 exits immediately and prompt is returned.
>
> - grep-2.25
>
> $ grep . >/dev/null
> a  <<input a and newline
> b  <<input b and newline
> c  <<input c and newline
>
> - grep-2.26
>
> $ src/grep . >/dev/null
> a  <<input a and newline
> $

Good point. While I suspect that would be much less likely to cause
trouble in practice, I would now rephrase:

  We should retain the optimization when reading from stdin that is
neither a pipe nor a tty.




Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Tue, 15 Nov 2016 23:34:01 GMT) Full text and rfc822 format available.

Message #32 received at 24941 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Tue, 15 Nov 2016 15:33:13 -0800
On 11/15/2016 03:13 PM, Jim Meyering wrote:
>    We should retain the optimization when reading from stdin that is
> neither a pipe nor a tty.

I am toying with the idea of retaining the optimization only if 
lseek-to-EOF succeeds, a heuristic that is a bit more restrictive. This 
arguably would conform better to the POSIX requirement that when grep 
exits "the file offset in the open file description is properly 
positioned just past the last byte processed by the utility." See the 
INPUT FILES section of 
<http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_04>.





Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Wed, 16 Nov 2016 00:11:02 GMT) Full text and rfc822 format available.

Message #35 received at 24941 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 24941 <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Tue, 15 Nov 2016 16:09:52 -0800
On Tue, Nov 15, 2016 at 3:33 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 11/15/2016 03:13 PM, Jim Meyering wrote:
>>
>>    We should retain the optimization when reading from stdin that is
>> neither a pipe nor a tty.
>
> I am toying with the idea of retaining the optimization only if lseek-to-EOF
> succeeds, a heuristic that is a bit more restrictive. This arguably would
> conform better to the POSIX requirement that when grep exits "the file
> offset in the open file description is properly positioned just past the
> last byte processed by the utility." See the INPUT FILES section of
> <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_04>.

I like it. That would make the offset in the input file predictable in
those cases.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sat, 19 Nov 2016 09:49:01 GMT) Full text and rfc822 format available.

Notification sent to Gary Johnson <garyjohn <at> spocom.com>:
bug acknowledged by developer. (Sat, 19 Nov 2016 09:49:01 GMT) Full text and rfc822 format available.

Message #40 received at 24941-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: Gary Johnson <garyjohn <at> spocom.com>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>,
 24941-done <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Sat, 19 Nov 2016 01:47:48 -0800
[Message part 1 (text/plain, inline)]
This turned into more work than I expected, as I kept finding performance 
glitches and/or correctness bugs in the neighborhood. I installed the attached 
set of patches. Patch 03 is the crucial one. Patch 10 trivially fixes an earlier 
test of mine and I'm too lazy to write a separate email for it.

This fixes the problem for me, so I'm taking the liberty of closing this bug report.
[0001-grep-avoid-unnecessary-isatty-calls.patch (text/x-diff, attachment)]
[0002-grep-improve-diagnostic-on-lseek-failure.patch (text/x-diff, attachment)]
[0003-grep-scale-back-dev-null-speedup.patch (text/x-diff, attachment)]
[0004-grep-drain-the-input-pipe-faster.patch (text/x-diff, attachment)]
[0005-grep-avoid-unnecessary-gettext-call.patch (text/x-diff, attachment)]
[0006-grep-avoid-O-N-2-buffer-reallocation.patch (text/x-diff, attachment)]
[0007-grep-treat-f-dev-null-like-m0.patch (text/x-diff, attachment)]
[0008-grep-tune-f-dev-null.patch (text/x-diff, attachment)]
[0009-grep-f-dev-null-L-PAT-FILE-outputs-FILE.patch (text/x-diff, attachment)]
[0010-tests-use-returns_-rather-than.patch (text/x-diff, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#24941; Package grep. (Sun, 20 Nov 2016 01:41:01 GMT) Full text and rfc822 format available.

Message #43 received at 24941-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Gary Johnson <garyjohn <at> spocom.com>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>,
 24941-done <at> debbugs.gnu.org
Subject: Re: bug#24941: Early termination bug in grep 2.26
Date: Sat, 19 Nov 2016 17:40:23 -0800
On Sat, Nov 19, 2016 at 1:47 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> This turned into more work than I expected, as I kept finding performance
> glitches and/or correctness bugs in the neighborhood. I installed the
> attached set of patches. Patch 03 is the crucial one. Patch 10 trivially
> fixes an earlier test of mine and I'm too lazy to write a separate email for
> it.
>
> This fixes the problem for me, so I'm taking the liberty of closing this bug
> report.

Impressive work. Thanks a lot!




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 18 Dec 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 133 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.