GNU bug report logs - #21670
surprising bug in grep -e with anchors

Previous Next

Package: grep;

Reported by: greg boyd <gboyd.ccsf <at> gmail.com>

Date: Sun, 11 Oct 2015 23:57:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21670 in the body.
You can then email your comments to 21670 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Sun, 11 Oct 2015 23:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to greg boyd <gboyd.ccsf <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 11 Oct 2015 23:57:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: greg boyd <gboyd.ccsf <at> gmail.com>
To: bug-grep <at> gnu.org, greg boyd <gboyd.ccsf <at> gmail.com>
Subject: surprising bug in grep -e with anchors
Date: Sun, 11 Oct 2015 14:01:07 -0700
[Message part 1 (text/plain, inline)]
This bug appears in GNU grep version 2.20. It is not present in the older
version I have installed on a home system (2.6.3.)

test case (single line)
abchelloabc

grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
however, the line is output with
grep -e '^hello' -e 'hello$'

I downloaded, built and tested the bug on GNU grep 2.21 and it still
appears.

weird.

(this was found by an introductory Linux student. )

-- 
-- greg
gboyd <at> ccsf.edu
Instructor, Computer Science
http://fog.ccsf.edu/~gboyd
[Message part 2 (text/html, inline)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Mon, 12 Oct 2015 04:35:02 GMT) Full text and rfc822 format available.

Notification sent to greg boyd <gboyd.ccsf <at> gmail.com>:
bug acknowledged by developer. (Mon, 12 Oct 2015 04:35:02 GMT) Full text and rfc822 format available.

Message #10 received at 21670-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: greg boyd <gboyd.ccsf <at> gmail.com>
Cc: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 21670-done <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Sun, 11 Oct 2015 21:34:05 -0700
[Message part 1 (text/plain, inline)]
greg boyd wrote:
> test case (single line)
> abchelloabc
>
> grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> however, the line is output with
> grep -e '^hello' -e 'hello$'

Oooo, that's a good one.  Give your student extra credit!  As it happens, the 
bug was recently fixed by this patch by Norihiro Tanaka:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5

and the fix should appear in the next grep release.  However, since the patch 
was supposed to affect only performance, it appears that the bug fix was due to 
luck, and I'm taking the liberty of adding your student's test case by 
installing the attached further patch, to help prevent this bug from coming back 
in a future version.
[0001-tests-add-test-case-for-Bug-21670.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Mon, 12 Oct 2015 08:15:02 GMT) Full text and rfc822 format available.

Message #13 received at 21670-done <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 21670-done <at> debbugs.gnu.org, greg boyd <gboyd.ccsf <at> gmail.com>
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Mon, 12 Oct 2015 11:14:21 +0300
Hi all,

On Sun, 11 Oct 2015 21:34:05 -0700
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> greg boyd wrote:
> > test case (single line)
> > abchelloabc
> >
> > grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> > however, the line is output with
> > grep -e '^hello' -e 'hello$'  
> 
> Oooo, that's a good one.  Give your student extra credit!  As it happens, the 
> bug was recently fixed by this patch by Norihiro Tanaka:
> 
> http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5
> 
> and the fix should appear in the next grep release.  However, since the patch 
> was supposed to affect only performance, it appears that the bug fix was due
> to luck, and I'm taking the liberty of adding your student's test case by 
> installing the attached further patch, to help prevent this bug from coming
> back in a future version.

thanks to greg, to greg's student, and to Paul for their contributions!

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Perl Humour - http://perl-begin.org/humour/

The first phrase that needs to be taught when teaching a new language is how to
say “Do you speak English?”. The first thing that needs to be taught when
teaching a new computer tool is how to exit it.

Please reply to list if it's a mailing list post - http://shlom.in/reply .




Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Mon, 12 Oct 2015 16:29:02 GMT) Full text and rfc822 format available.

Message #16 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: greg boyd <gboyd.ccsf <at> gmail.com>
Cc: 21670 <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Mon, 12 Oct 2015 09:28:03 -0700
On Sun, Oct 11, 2015 at 2:01 PM, greg boyd <gboyd.ccsf <at> gmail.com> wrote:
> This bug appears in GNU grep version 2.20. It is not present in the older
> version I have installed on a home system (2.6.3.)
>
> test case (single line)
> abchelloabc
>
> grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> however, the line is output with
> grep -e '^hello' -e 'hello$'
>
> I downloaded, built and tested the bug on GNU grep 2.21 and it still
> appears.

Thank you for the report.
I confirm that it affects grep-2.21 with this:

  $ echo axa |/p/p/grep-2.21/bin/grep -E '^x|x$'
  axa

However, it appears to be fixed in the version built
from the latest sources, yet there is no mention in NEWS.

The actual bug was introduced in v2.18-85-g2c94326,
so first appeared in the grep-2.19 release. I will track down
the commit that fixed it, add a test if required and update
NEWS accordingly.

With this, I will prioritize making a new release soon.




Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Mon, 12 Oct 2015 22:19:01 GMT) Full text and rfc822 format available.

Message #19 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 21670 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 gboyd.ccsf <at> gmail.com
Cc: 21670-done <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Mon, 12 Oct 2015 15:17:42 -0700
On Sun, Oct 11, 2015 at 9:34 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> greg boyd wrote:
>>
>> test case (single line)
>> abchelloabc
>>
>> grep does not find the line with grep -e '^hello'  nor with grep -e
>> 'hello$'
>> however, the line is output with
>> grep -e '^hello' -e 'hello$'
>
>
> Oooo, that's a good one.  Give your student extra credit!  As it happens,
> the bug was recently fixed by this patch by Norihiro Tanaka:
>
> http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5
>
> and the fix should appear in the next grep release.  However, since the
> patch was supposed to affect only performance, it appears that the bug fix
> was due to luck, and I'm taking the liberty of adding your student's test
> case by installing the attached further patch, to help prevent this bug from
> coming back in a future version.

Thanks for adding that test, Paul.
However, note that the bug does not require two uses of "-e" per-se.
Multiple "-e"-specified regexps get translated internally to those regexps
separated by the ERE "|" alternation/"or" operator. A smaller, perhaps
more illustrative test case is to use an explicit "|":

  $ echo axa | grep -E '^x|x$'
  axa

FYI, one can demonstrate that it was a problem in the DFA
matcher without resorting to gdb by inserting a "()" in the ERE,
since that construct cannot work in a DFA and grep resorts
to using glibc's full-blown regex matcher. With that, even the
afflicted versions of grep get the desired result (no match):

  $ echo axa | grep -E '^x()|x$'; echo $?
  $ 1




Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Mon, 12 Oct 2015 22:19:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Tue, 13 Oct 2015 04:23:02 GMT) Full text and rfc822 format available.

Message #25 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: greg boyd <gboyd.ccsf <at> gmail.com>
Cc: 21670 <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Mon, 12 Oct 2015 21:21:43 -0700
On Mon, Oct 12, 2015 at 9:28 AM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sun, Oct 11, 2015 at 2:01 PM, greg boyd <gboyd.ccsf <at> gmail.com> wrote:
>> This bug appears in GNU grep version 2.20. It is not present in the older
>> version I have installed on a home system (2.6.3.)
...
> The actual bug was introduced in v2.18-85-g2c94326,
> so first appeared in the grep-2.19 release. I will track down
> the commit that fixed it, add a test if required and update
> NEWS accordingly.

Thanks again.

Paul already added a test, so I've updated NEWS
with the missing info:

  http://git.sv.gnu.org/cgit/grep.git/commit/?id=93a6d6d7bd1d68




Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Tue, 13 Oct 2015 13:32:01 GMT) Full text and rfc822 format available.

Message #28 received at 21670-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 21670-done <at> debbugs.gnu.org, greg boyd <gboyd.ccsf <at> gmail.com>
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Tue, 13 Oct 2015 22:30:03 +0900
[Message part 1 (text/plain, inline)]
On Sun, 11 Oct 2015 21:34:05 -0700
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> greg boyd wrote:
> > test case (single line)
> > abchelloabc
> >
> > grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> > however, the line is output with
> > grep -e '^hello' -e 'hello$'
> 
> Oooo, that's a good one.  Give your student extra credit!  As it happens, the bug was recently fixed by this patch by Norihiro Tanaka:
> 
> http://git.savannah.gnu.org/cgit/grep.git/commit/?id=256a4b494fe1c48083ba73b4f62607234e4fefd5
> 
> and the fix should appear in the next grep release.  However, since the patch was supposed to affect only performance, it appears that the bug fix was due to luck, and I'm taking the liberty of adding your student's test case by installing the attached further patch, to help prevent this bug from coming back in a future version.

I found above patch is also buggy.  It is never fix.  It returns shorter
`must' than expected.  e.g. `must' for pattern `.hello' is `hello', but
returns `hell' by this bug.  Next, `must' for pattern `^hello' is `hello'
but returns `hell'.  It will cause slite performance down, and disappear
bug#21670,  BTW, I guess the bug does not change external behavior.

First patch fixes the bug.  After the patch is applied, bug#21670
appears again.  And, second patch fixes bug#21670.

When pattern has ^ and/or $, if begline and/or endline flag of mp is
turned off, EXACT should be false.
[0001-dfa-don-t-use-DFA-for-exact-matching.patch (text/plain, attachment)]
[0002-dfa-fix-bug-in-alternate-of-sub-patterns-different-i.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Tue, 13 Oct 2015 22:32:02 GMT) Full text and rfc822 format available.

Message #31 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 21670 <at> debbugs.gnu.org, greg boyd <gboyd.ccsf <at> gmail.com>
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Tue, 13 Oct 2015 15:31:28 -0700
[Message part 1 (text/plain, inline)]
On 10/13/2015 06:30 AM, Norihiro Tanaka wrote:
> First patch fixes the bug.  After the patch is applied, bug#21670
> appears again.  And, second patch fixes bug#21670.
Thanks, I installed both of them after rewording the commit logs a bit, 
along with the attached minor further improvement.
[0001-dfa-make-the-executable-a-bit-smaller.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Wed, 14 Oct 2015 01:44:01 GMT) Full text and rfc822 format available.

Message #34 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: greg boyd <gboyd.ccsf <at> gmail.com>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>,
 21670 <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Tue, 13 Oct 2015 18:43:18 -0700
On Tue, Oct 13, 2015 at 3:31 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 10/13/2015 06:30 AM, Norihiro Tanaka wrote:
>>
>> First patch fixes the bug.  After the patch is applied, bug#21670
>> appears again.  And, second patch fixes bug#21670.
>
> Thanks, I installed both of them after rewording the commit logs a bit,
> along with the attached minor further improvement.

Thank you, Paul.




Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Wed, 14 Oct 2015 15:02:02 GMT) Full text and rfc822 format available.

Message #37 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 21670 <at> debbugs.gnu.org, greg boyd <gboyd.ccsf <at> gmail.com>
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Thu, 15 Oct 2015 00:01:03 +0900
On Tue, 13 Oct 2015 15:31:28 -0700
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> On 10/13/2015 06:30 AM, Norihiro Tanaka wrote:
> > First patch fixes the bug.  After the patch is applied, bug#21670
> > appears again.  And, second patch fixes bug#21670.
> Thanks, I installed both of them after rewording the commit logs a bit, along with the attached minor further improvement.

Thanks for review and rewording.





Information forwarded to bug-grep <at> gnu.org:
bug#21670; Package grep. (Wed, 28 Oct 2015 00:39:02 GMT) Full text and rfc822 format available.

Message #40 received at 21670 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: greg boyd <gboyd.ccsf <at> gmail.com>, drakewang <at> gmail.com
Cc: 21670 <at> debbugs.gnu.org
Subject: Re: bug#21670: surprising bug in grep -e with anchors
Date: Tue, 27 Oct 2015 17:37:46 -0700
[Message part 1 (text/plain, inline)]
On Sun, Oct 11, 2015 at 2:01 PM, greg boyd <gboyd.ccsf <at> gmail.com> wrote:
> This bug appears in GNU grep version 2.20. It is not present in the older
> version I have installed on a home system (2.6.3.)
>
> test case (single line)
> abchelloabc
>
> grep does not find the line with grep -e '^hello'  nor with grep -e 'hello$'
> however, the line is output with
> grep -e '^hello' -e 'hello$'
>
> I downloaded, built and tested the bug on GNU grep 2.21 and it still
> appears.
>
> weird.
>
> (this was found by an introductory Linux student. )

I asked Greg for the student's name and then added two names to
THANKS.in with this:
[0001-maint-update-THANKS.in.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 25 Nov 2015 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 126 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.