GNU bug report logs - #17640
grep with -m reads the entire input

Previous Next

Package: grep;

Reported by: Marc Aldorasi <m101010a <at> gmail.com>

Date: Fri, 30 May 2014 06:53:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17640 in the body.
You can then email your comments to 17640 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17640; Package grep. (Fri, 30 May 2014 06:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Marc Aldorasi <m101010a <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 30 May 2014 06:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Marc Aldorasi <m101010a <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep with -m reads the entire input
Date: Fri, 30 May 2014 01:45:19 -0400
With grep 2.18, the -m option would cause grep to stop reading input
after printing the requested number of matching lines.  With version
2.19, grep reads the entire input before exiting.  Interestingly, grep
does not read the entire input if the -c or -C0 options are added in
addition to -m, and also when using -l or -q instead of -m.  I believe
this is caused by commit 5122195.




Information forwarded to bug-grep <at> gnu.org:
bug#17640; Package grep. (Fri, 30 May 2014 07:09:01 GMT) Full text and rfc822 format available.

Message #8 received at 17640 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Marc Aldorasi <m101010a <at> gmail.com>
Cc: 17640 <at> debbugs.gnu.org
Subject: Re:  grep with -m reads the entire input
Date: Fri, 30 May 2014 00:08:09 -0700
> With grep 2.18, the -m option would cause grep to stop reading input
> after printing the requested number of matching lines.  With version
> 2.19, grep reads the entire input before exiting.

Can you give an example of the failure?  What platform are you running 
on?  I couldn't reproduce the problem on Fedora 20 x86-64.  Here's how I 
tried:

$ seq 1000000 >million
$ (grep -m1000 0 | wc -l; wc -l) <million
1000
995994

and these numbers look correct to me.




Information forwarded to bug-grep <at> gnu.org:
bug#17640; Package grep. (Fri, 30 May 2014 15:58:01 GMT) Full text and rfc822 format available.

Message #11 received at 17640 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Marc Aldorasi <m101010a <at> gmail.com>
Cc: 17640 <at> debbugs.gnu.org
Subject: Re: bug#17640: grep with -m reads the entire input
Date: Fri, 30 May 2014 08:56:32 -0700
[Message part 1 (text/plain, inline)]
On Thu, May 29, 2014 at 10:45 PM, Marc Aldorasi <m101010a <at> gmail.com> wrote:
> With grep 2.18, the -m option would cause grep to stop reading input
> after printing the requested number of matching lines.  With version
> 2.19, grep reads the entire input before exiting.  Interestingly, grep
> does not read the entire input if the -c or -C0 options are added in
> addition to -m, and also when using -l or -q instead of -m.  I believe
> this is caused by commit 5122195.

Thanks a lot for the report.  Just in time.
I confirm that it's a bug introduced in 2.19.
To test, run "seq 1000000 > million", then
 "strace -e read grep 0 million" first using grep-2.18
(shows just a few read syscalls), and then with 2.19,
which shows grep reading the entire million-line file.

Here's an incomplete patch.  Obviously there's a lot more
to be added, including NEWS and a nontrivial test. This
was introduced by commit v2.18-140-g6f07900
[grep-m-patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17640; Package grep. (Fri, 30 May 2014 16:00:04 GMT) Full text and rfc822 format available.

Message #14 received at 17640 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Marc Aldorasi <m101010a <at> gmail.com>
Cc: 17640 <17640 <at> debbugs.gnu.org>
Subject: Re: bug#17640: grep with -m reads the entire input
Date: Fri, 30 May 2014 08:58:44 -0700
On Fri, May 30, 2014 at 8:56 AM, Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, May 29, 2014 at 10:45 PM, Marc Aldorasi <m101010a <at> gmail.com> wrote:
>> With grep 2.18, the -m option would cause grep to stop reading input
>> after printing the requested number of matching lines.  With version
>> 2.19, grep reads the entire input before exiting.  Interestingly, grep
>> does not read the entire input if the -c or -C0 options are added in
>> addition to -m, and also when using -l or -q instead of -m.  I believe
>> this is caused by commit 5122195.
>
> Thanks a lot for the report.  Just in time.
> I confirm that it's a bug introduced in 2.19.
> To test, run "seq 1000000 > million", then
>  "strace -e read grep 0 million" first using grep-2.18
> (shows just a few read syscalls), and then with 2.19,
> which shows grep reading the entire million-line file.

Correction: to reproduce, you'll have to insert -m1 in that grep command.

> Here's an incomplete patch.  Obviously there's a lot more
> to be added, including NEWS and a nontrivial test. This
> was introduced by commit v2.18-140-g6f07900




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Fri, 30 May 2014 16:36:02 GMT) Full text and rfc822 format available.

Notification sent to Marc Aldorasi <m101010a <at> gmail.com>:
bug acknowledged by developer. (Fri, 30 May 2014 16:36:03 GMT) Full text and rfc822 format available.

Message #19 received at 17640-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Marc Aldorasi <m101010a <at> gmail.com>
Cc: 17640-done <at> debbugs.gnu.org
Subject: Re: bug#17640: grep with -m reads the entire input
Date: Fri, 30 May 2014 09:34:55 -0700
[Message part 1 (text/plain, inline)]
On Fri, May 30, 2014 at 8:58 AM, Jim Meyering <jim <at> meyering.net> wrote:
> On Fri, May 30, 2014 at 8:56 AM, Jim Meyering <jim <at> meyering.net> wrote:
>> On Thu, May 29, 2014 at 10:45 PM, Marc Aldorasi <m101010a <at> gmail.com> wrote:
>>> With grep 2.18, the -m option would cause grep to stop reading input
>>> after printing the requested number of matching lines.  With version
>>> 2.19, grep reads the entire input before exiting.  Interestingly, grep
>>> does not read the entire input if the -c or -C0 options are added in
>>> addition to -m, and also when using -l or -q instead of -m.  I believe
>>> this is caused by commit 5122195.
>>
>> Thanks a lot for the report.  Just in time.
>> I confirm that it's a bug introduced in 2.19.
>> To test, run "seq 1000000 > million", then
>>  "strace -e read grep 0 million" first using grep-2.18
>> (shows just a few read syscalls), and then with 2.19,
>> which shows grep reading the entire million-line file.
>
> Correction: to reproduce, you'll have to insert -m1 in that grep command.
>
>> Here's an incomplete patch.  Obviously there's a lot more
>> to be added, including NEWS and a nontrivial test. This
>> was introduced by commit v2.18-140-g6f07900

This bears some explanation.  I've attached a more complete patch
(albeit still hastily composed, so I'll wait a few hours,
in case there's feedback)

Prior to grep-2.19, with --max-count=N, this first disjunct would
be true after the Nth match, because pending would be 0:

          if ((!outleft && !pending) || (nlines && done_on_match))
            goto finish_grep;

However, a seemingly unrelated change affected how "pending" is set:

      pending = out_quiet ? 0 : out_after;

We used to ensure that "out_after" was non-negative, because
default_context was always non-negative:

      if (out_after < 0)
        out_after = default_context;

But the recent context-related change invalidated that assumption:

      -  default_context = 0;
      +  default_context = -1;

Here's the patch:
[0001-grep-fix-max-count-N-m-N-to-stop-reading-after-Nth-m.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17640; Package grep. (Fri, 30 May 2014 19:08:02 GMT) Full text and rfc822 format available.

Message #22 received at 17640-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Marc Aldorasi <m101010a <at> gmail.com>
Cc: 17640-done <17640-done <at> debbugs.gnu.org>
Subject: Re: bug#17640: grep with -m reads the entire input
Date: Fri, 30 May 2014 12:07:12 -0700
On Fri, May 30, 2014 at 9:34 AM, Jim Meyering <jim <at> meyering.net> wrote:
...
> Here's the patch:

FYI, I've adjusted the commit log to point to the correct diff:

      This bug was introduced by commit v2.18-139-g5122195.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 28 Jun 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 304 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.