GNU bug report logs -
#26864
Clarification on obscure regular expressions mentioned in known bugs
Previous Next
To reply to this bug, email your comments to 26864 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#26864
; Package
grep
.
(Wed, 10 May 2017 15:31:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Sundeep Agarwal <learnbyexample.net <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 10 May 2017 15:31:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
From the man page, version 'grep (GNU grep) 2.25'
--------------------------------
Known Bugs
Large repetition counts in the {n,m} construct may cause grep to use
lots of memory. In addition, certain other obscure regular expressions
require exponential time and space, and may cause grep to run out of memory.
Back-references are very slow, and may require exponential time.
--------------------------------
I was trying a regular expression to find words from dictionary that have
two different instances of repeated letters, for example the words:
misspellings, chilliness, woodcutter etc
$ # gives no output
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
$ # works as expected with PCRE
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
I asked regarding this on
https://stackoverflow.com/questions/43572924/ere-adding-quantifier-to-group-with-inner-group-and-back-reference
and other forums. It helped to identify some more cases like
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}'
and
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,3}){3}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,4}){3}'
$ # seems dependent on character class clashing with back reference
characters and quantifier count
$ # a, b and c are the characters matching back reference
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){2}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[byz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[acyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[ayz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[cyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[yz]{0,4}){3}'
aazbbycc
The same behavior is seen with 'sed (GNU sed) 4.2.2' as well. For ex:
$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,3}){3}/p'
aazbbycc
$ # no output
$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,4}){3}/p'
So, my question is whether these regular expression examples come under
'obscure regular expressions' mentioned in the man page. If so, I feel
there should be an error message displayed instead of no output
Regards,
Sundeep
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#26864
; Package
grep
.
(Mon, 30 Dec 2019 09:02:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 26864 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 5/10/17 1:48 AM, Sundeep Agarwal wrote:
> my question is whether these regular expression examples come under
> 'obscure regular expressions' mentioned in the man page.
No, they're bugs in grep's regular expression implementation, which is taken
from glibc. I filed a bug report against glibc here:
https://sourceware.org/bugzilla/show_bug.cgi?id=25322
and installed the attached patches to GNU grep to try to document this mess
better. The first patch is the important one; the other two merely standardize
spelling and fix a typo in the first patch. Thanks for reporting the problem.
[0001-doc-mention-back-reference-bugs.patch (text/x-patch, attachment)]
[0002-doc-spell-back-reference-more-consistently.patch (text/x-patch, attachment)]
[0003-doc-fix-bug-typo.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#26864
; Package
grep
.
(Mon, 30 Dec 2019 18:52:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 26864 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On further thought, we shouldn't be encouraging palindromish REs in the manual,
so that naive users aren't drawn into this mess. So I installed the attached
further patch to the documentation.
Of course it would be better to fix the back-reference bugs but this is low
priority and I doubt whether anybody has the time.
[0001-doc-don-t-encourage-back-references.patch (text/x-patch, attachment)]
Merged 26864 36148.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Thu, 02 Jan 2020 09:35:01 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 86 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.