GNU bug report logs -
#44351
Bug in grep v3.2 onwards in regular expression matching
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 44351 in the body.
You can then email your comments to 44351 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sat, 31 Oct 2020 16:17:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Gonzalo Padrino <grimalg.on+gnu <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 31 Oct 2020 16:17:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
While using GNU grep v3.4 in an Ubuntu 20.04 userspace running on top of
Win10 WSL (yeah, i know... but also checked in other envs) i discovered
what seems like an obvious bug (if i'm not mistaken).
The bug:
-----
me <at> host:~$ echo 'xxxxy' |grep -E '^x+x+x+x+y$'
xxxxy
me <at> host:~$ echo 'xxxy' |grep -E '^x+x+x+x+y$'
xxxy
me <at> host:~$ echo 'xxy' |grep -E '^x+x+x+x+y$'
xxy
me <at> host:~$ echo 'xy' |grep -E '^x+x+x+x+y$'
----
...the terminal supports ansi color escapes, and what's really weird is
that only the result from the first command is colored in red. First and
fourth commands yield correct results; the second and third do not, as they
should not match it's input.
I've tested releases from v3.1 to latest v3.5 and found the anomalous
behaviour in version v3.2 through v3.5. A (quick and clunky) git bisect led
me to believe it was introduced about two years ago, possibly in commit
123620af88f55c3e0cc9f0aed7311c72f625bc82 (
https://git.savannah.gnu.org/cgit/grep.git/commit/?id=123620af88f55c3e0cc9f0aed7311c72f625bc82).
If this is true, it would mean either the bug is in gnulib, or maybe grep
needed to do some kind of extra handling on it's side.
Kind regards. Gonzalo Padrino.
P.S.: I had to patch some things in order to successfully compile the code
after checking out some problematic commits (pragmas to avoid warnings
about "pure" and "noreturn" function attributes, a missing configmake
dependency in bootstrap.conf, etc ).
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sat, 31 Oct 2020 16:48:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 44351 <at> debbugs.gnu.org (full text, mbox):
I can reproduce this is gawk. :-(
It's a bug somewhere in the dfa matcher. When I export GAWK_NO_DFA=1
to bypass the dfa matcher, only xxxxy matches.
Hope this helps,
Arnold
Gonzalo Padrino <grimalg.on+gnu <at> gmail.com> wrote:
> Hello,
>
> While using GNU grep v3.4 in an Ubuntu 20.04 userspace running on top of
> Win10 WSL (yeah, i know... but also checked in other envs) i discovered
> what seems like an obvious bug (if i'm not mistaken).
> The bug:
> -----
> me <at> host:~$ echo 'xxxxy' |grep -E '^x+x+x+x+y$'
> xxxxy
> me <at> host:~$ echo 'xxxy' |grep -E '^x+x+x+x+y$'
> xxxy
> me <at> host:~$ echo 'xxy' |grep -E '^x+x+x+x+y$'
> xxy
> me <at> host:~$ echo 'xy' |grep -E '^x+x+x+x+y$'
>
> ----
> ...the terminal supports ansi color escapes, and what's really weird is
> that only the result from the first command is colored in red. First and
> fourth commands yield correct results; the second and third do not, as they
> should not match it's input.
>
> I've tested releases from v3.1 to latest v3.5 and found the anomalous
> behaviour in version v3.2 through v3.5. A (quick and clunky) git bisect led
> me to believe it was introduced about two years ago, possibly in commit
> 123620af88f55c3e0cc9f0aed7311c72f625bc82 (
> https://git.savannah.gnu.org/cgit/grep.git/commit/?id=123620af88f55c3e0cc9f0aed7311c72f625bc82).
> If this is true, it would mean either the bug is in gnulib, or maybe grep
> needed to do some kind of extra handling on it's side.
>
> Kind regards. Gonzalo Padrino.
>
> P.S.: I had to patch some things in order to successfully compile the code
> after checking out some problematic commits (pragmas to avoid warnings
> about "pure" and "noreturn" function attributes, a missing configmake
> dependency in bootstrap.conf, etc ).
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sat, 31 Oct 2020 17:31:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 44351 <at> debbugs.gnu.org (full text, mbox):
On Sat, Oct 31, 2020 at 9:17 AM Gonzalo Padrino
<grimalg.on+gnu <at> gmail.com> wrote:
> While using GNU grep v3.4 in an Ubuntu 20.04 userspace running on top of
> Win10 WSL (yeah, i know... but also checked in other envs) i discovered
> what seems like an obvious bug (if i'm not mistaken).
> The bug:
> -----
> me <at> host:~$ echo 'xxxxy' |grep -E '^x+x+x+x+y$'
> xxxxy
> me <at> host:~$ echo 'xxxy' |grep -E '^x+x+x+x+y$'
> xxxy
> me <at> host:~$ echo 'xxy' |grep -E '^x+x+x+x+y$'
> xxy
> me <at> host:~$ echo 'xy' |grep -E '^x+x+x+x+y$'
>
> ----
> ...the terminal supports ansi color escapes, and what's really weird is
> that only the result from the first command is colored in red. First and
> fourth commands yield correct results; the second and third do not, as they
> should not match it's input.
>
> I've tested releases from v3.1 to latest v3.5 and found the anomalous
> behaviour in version v3.2 through v3.5. A (quick and clunky) git bisect led
> me to believe it was introduced about two years ago, possibly in commit
> 123620af88f55c3e0cc9f0aed7311c72f625bc82 (
> https://git.savannah.gnu.org/cgit/grep.git/commit/?id=123620af88f55c3e0cc9f0aed7311c72f625bc82).
> If this is true, it would mean either the bug is in gnulib, or maybe grep
> needed to do some kind of extra handling on it's side.
Thank you for reporting that. I confirm this is a bug in the very latest.
This mistakenly matches:
$ echo xxy |grep -E '^x+x+x+y$'
xxy
That regular expression requires that any match have at least three
leading 'x's.
This is indeed due to a bug in gnulib's lib/dfa.c.
So far, I've found that we can band-aid fix it by disabling part of
merge_nfa_state's optimizations with this patch, but I do not propose
to make this change. This is just to show where the problem lies. I'm
pretty sure we can retain and correct the optimization.
diff --git a/lib/dfa.c b/lib/dfa.c
index 74aafa2ee..087c266c5 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -2459,7 +2459,7 @@ merge_nfa_state (struct dfa *d, idx_t tindex, char *flags,
continue;
if (flags[sindex] & OPT_REPEAT)
- delete (sindex, &follows[sindex]);
+ continue;
merge2 (&follows[dindex], &follows[sindex], merged);
Merged 44351 44352.
Request was from
Jim Meyering <jim <at> meyering.net>
to
control <at> debbugs.gnu.org
.
(Sat, 31 Oct 2020 18:51:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 07:43:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 44351 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Example,
a+a+a
1 2 3
position 1 has a repetition of "a" and other transition with "a".
position 2 has a repetition of "a" and other transition with "a", too.
Then DFA was merging the two nodes, but it is wrong.
Now similar nodes in series are not merged.
[0001-dfa-remain-similar-nodes-in-series-in-optimization.patch (text/plain, attachment)]
[0001-tests-add-the-test-for-bugfix-in-gnulib-s-dfa.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 15:00:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 44351 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Example,
>
> a+a+a
> 1 2 3
>
> position 1 has a repetition of "a" and other transition with "a".
> position 2 has a repetition of "a" and other transition with "a", too.
> Then DFA was merging the two nodes, but it is wrong.
>
> Now similar nodes in series are not merged.
This patch works for me in gawk.
Looking forward to seeing it in Gnulib so that I can pull it in
Thanks!
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 15:21:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 44351 <at> debbugs.gnu.org (full text, mbox):
On Sun, Nov 1, 2020 at 12:42 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Example,
>
> a+a+a
> 1 2 3
>
> position 1 has a repetition of "a" and other transition with "a".
> position 2 has a repetition of "a" and other transition with "a", too.
> Then DFA was merging the two nodes, but it is wrong.
>
> Now similar nodes in series are not merged.
Thank you for the quick work.
Would you please send a revised test patch? That one appears to be a
tiny delta added to a file that only you have locally. I.e., it
requires the new file tests/ere.tests, with 200+ lines.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 15:23:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 44351 <at> debbugs.gnu.org (full text, mbox):
On Sun, Nov 1, 2020 at 7:19 AM Jim Meyering <jim <at> meyering.net> wrote:
>
> On Sun, Nov 1, 2020 at 12:42 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> > Example,
> >
> > a+a+a
> > 1 2 3
> >
> > position 1 has a repetition of "a" and other transition with "a".
> > position 2 has a repetition of "a" and other transition with "a", too.
> > Then DFA was merging the two nodes, but it is wrong.
> >
> > Now similar nodes in series are not merged.
>
> Thank you for the quick work.
> Would you please send a revised test patch? That one appears to be a
> tiny delta added to a file that only you have locally. I.e., it
> requires the new file tests/ere.tests, with 200+ lines.
Oops. My mistake. I thought you were adding the test in gnulib and
hadded a new framework there.
The tests/ere.tests file is in grep. No problem. Thanks again.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 15:32:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 44351 <at> debbugs.gnu.org (full text, mbox):
Hi,
By the way, I was wondering whether to add the test to ere.tests or
spencer1.tests or to a new file. How should they be used properly?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 16:04:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 44351 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sun, Nov 1, 2020 at 7:31 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Hi,
> By the way, I was wondering whether to add the test to ere.tests or
> spencer1.tests or to a new file. How should they be used properly?
Adding the new test in either place is fine, but there should be a comment.
Also, we need a NEWS entry. I'll add that separately.
More importantly, there must be a test in gnulib. I'm adding one with
the attached.
[0001-dfa-tests-test-for-invalid-merge-fix.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 16:06:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 44351 <at> debbugs.gnu.org (full text, mbox):
On Sun, Nov 1, 2020 at 8:02 AM Jim Meyering <jim <at> meyering.net> wrote:
>
> On Sun, Nov 1, 2020 at 7:31 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> > Hi,
> > By the way, I was wondering whether to add the test to ere.tests or
> > spencer1.tests or to a new file. How should they be used properly?
>
> Adding the new test in either place is fine, but there should be a comment.
>
> Also, we need a NEWS entry. I'll add that separately.
My mention of NEWS here is ambiguous.
When I wrote that, I was thinking of grep's NEWS file.
Just after sending, I realized I must also mention the fix in gnulib's
NEWS file.
Amending shortly.
> More importantly, there must be a test in gnulib. I'm adding one with
> the attached.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 16:11:01 GMT)
Full text and
rfc822 format available.
Message #37 received at 44351 <at> debbugs.gnu.org (full text, mbox):
On Sun, Nov 1, 2020 at 8:05 AM Jim Meyering <jim <at> meyering.net> wrote:
> On Sun, Nov 1, 2020 at 8:02 AM Jim Meyering <jim <at> meyering.net> wrote:
> >
> > On Sun, Nov 1, 2020 at 7:31 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> > > Hi,
> > > By the way, I was wondering whether to add the test to ere.tests or
> > > spencer1.tests or to a new file. How should they be used properly?
> >
> > Adding the new test in either place is fine, but there should be a comment.
> >
> > Also, we need a NEWS entry. I'll add that separately.
>
> My mention of NEWS here is ambiguous.
> When I wrote that, I was thinking of grep's NEWS file.
> Just after sending, I realized I must also mention the fix in gnulib's
> NEWS file.
> Amending shortly.
I hadn't read gnulib's NEWS file in too long. It's for things like API
changes, not bug fixes.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sun, 01 Nov 2020 18:04:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Gonzalo Padrino <grimalg.on+gnu <at> gmail.com>
:
bug acknowledged by developer.
(Sun, 01 Nov 2020 18:04:02 GMT)
Full text and
rfc822 format available.
Message #42 received at 44351-done <at> debbugs.gnu.org (full text, mbox):
Thanks to all for the bug report and quick fix. Closing the bug report.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sun, 01 Nov 2020 18:04:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Gonzalo Padrino <grimalgon <at> gmail.com>
:
bug acknowledged by developer.
(Sun, 01 Nov 2020 18:04:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#44351
; Package
grep
.
(Sun, 01 Nov 2020 18:20:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 44351-done <at> debbugs.gnu.org (full text, mbox):
On Sun, Nov 1, 2020 at 10:03 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Thanks to all for the bug report and quick fix. Closing the bug report.
Thanks for closing that. I've pushed the gnulib changes and am about
to push those for grep, too.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 30 Nov 2020 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 173 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.