GNU bug report logs -
#39678
'grep --ignore-case --color' does not always color the matches
Previous Next
To reply to this bug, email your comments to 39678 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Wed, 19 Feb 2020 15:28:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Benno Schulenberg <bensberg <at> telfort.nl>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 19 Feb 2020 15:28:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
echo a | grep -i --color '\a'
echo a | grep -i --color '\A'
Of the above two commands, only the second colorizes the printed "a".
(An old GNU grep on NetBSD (grep 2.5.1a nb1) does the opposite of
a modern grep: it will show color for \a, but none for \A.)
Bug was found in GNU grep 3.1 while checking whether it understands
\d as a shorthand for [0-9]. Still present in 3.4. The locale does
not appear to matter.
In a report against glibc [1] that seems to be related, a comment
says that "Unknown backslash escapes invoke undefined behaviour."
But where in the documentation does it say so?
When searching for a regular expression in GNU nano (^W M-R), nano says
it cannot find any \a. But for \A, it will find all "a"s and "A"s.
Nano's default search is case insensitive. See [2] for the original
report by Ben Addis. I suppose this is the same or a related bug in
the regex module of gnulib, which nano uses.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=22425
[2] https://savannah.gnu.org/bugs/?57852
Benno
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:00:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I did some investigation and it seems the problem with "\a" is not that the
output is not colored but that it's printed at all while it should not be.
It's being printed even when re_search in EGexecute doesn't return a match.
As a result the for loop in print_line_middle is not being run - this means
the "a" output that one can see is not from the matched string but from the
rest of the line. To fix this I suggest to return NULL from
print_line_middle if `b` variable equals NULL - which means the loop was
not executed.
I'm sending the patch with the fix in the attachment.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-grep-Don-t-print-line-if-matcher-returned-nothing.patch (application/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:00:05 GMT)
Full text and
rfc822 format available.
Message #11 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I did some investigation and it seems the problem with "\a" is not that the
output is not colored but that it's printed at all while it should not be.
It's being printed even when re_search in EGexecute doesn't return a match.
As a result the for loop in print_line_middle is not being run - this means
the "a" output that you see is not from the matched string but from the
rest of the line. To fix this I suggest to return NULL from
print_line_middle if `b` variable equals NULL - which means the loop was
not executed.
I'm sending the patch with the fix in the attachment.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-grep-Don-t-print-line-if-matcher-returned-nothing.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:00:06 GMT)
Full text and
rfc822 format available.
Message #14 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I did some investigation and it seems the problem with "\a" is not that the
output is not colored but that it's printed at all while it should not be.
It's being printed even when re_search in EGexecute doesn't return a match.
As a result the for loop in print_line_middle is not being run - this means
the "a" output that you see is not from the matched string but from the
rest of the line. To fix this I suggest to return NULL from
print_line_middle if `b` variable equals NULL - which means the loop was
not executed.
I'm sending the patch with the fix in the attachment.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-grep-Don-t-print-line-if-matcher-returned-nothing.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:00:06 GMT)
Full text and
rfc822 format available.
Message #17 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I did some investigation and it seems the problem with "\a" is not that the
output is not colored but that it's printed at all while it should not be.
It's being printed even when re_search in EGexecute doesn't return a match.
As a result the for loop in print_line_middle is not being run - this means
the "a" output that you see is not from the matched string but from the
rest of the line. To fix this I suggest to return NULL from
print_line_middle if `b` variable equals NULL - which means the loop was
not executed.
I'm sending the patch with the fix in the attachment.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-grep-Don-t-print-line-if-matcher-returned-nothing.patch (application/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:00:07 GMT)
Full text and
rfc822 format available.
Message #20 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I did some investigation and it seems the problem with '\a' is not that the output is not colored but that it's printed at all while it should not be. It's being printed even when re_search in EGexecute doesn't return a match. As a result the for loop in print_line_middle is not being run - this means the 'a' output that you see is not from the matched string but from the rest of the line. To fix this I suggest to return NULL from print_line_middle if 'b' variable equals NULL - which means the loop was not executed.
I'm sending the patch with the fix in the attachment.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-grep-Don-t-print-line-if-matcher-returned-nothing.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:09:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 39678 <at> debbugs.gnu.org (full text, mbox):
On 10/17/21 15:15, Tomasz Dziendzielski wrote:
> It's being printed even when re_search in EGexecute doesn't return a match.
Gexecute should return a match, though, right? This is because the
pattern '\a' matches the data 'a'. So it sounds like the bug is in
EGexecute somewhere, not in its caller.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:46:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
sorry for producing such a spam, there was huge 7h delay between my mail
and the reply from the mailing list, so I thought something was not sent
correctly on my side.
>EGexecute should return a match, though,
>right? The pattern '\a' matches
>the data 'a'. So the bug is in EGexecute >somewhere, not in its caller.
Thanks for the reply. In my opinion the bug is not in the EGexecute, since
it uses re_search (from gnulib) and the re_search is not returning a match.
So there is no problem in EGexecute. I also compared python re.search with
\a also don't return a match and I found \a can be interpreted a bell or
alarm.
It's good to mention that the lack of match happens for all lowercase
characters that are not defined in regex (for example echo "j" | grep -i
--color '\j'. Comparing to python, they don't allow us to escape any
incorrect lowercase characters, for example:
>>>re.search("\j","j", re.IGNORECASE)
re.error: bad escape \j at position 0
Due to that I think grep should also ignore the non-maching/bad character
instead of printing them.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 04:52:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>EGexecute should return a match, though,
>right? The pattern '\a' matches
>the data 'a'. So the bug is in EGexecute >somewhere, not in its caller.
Thanks for the reply. In my opinion the bug is not in the EGexecute, since
it uses re_search (from gnulib) and the re_search is not returning a match.
So there is no problem in EGexecute. I also compared python re.search with
\a also don't return a match and I found \a can be interpreted a bell or
alarm.
It's good to mention that the lack of match happens for all lowercase
characters that are not defined in regex (for example echo "j" | grep -i
--color '\j'. Comparing to python, they don't allow us to escape any
incorrect lowercase characters, for example:
>>>re.search("\j","j", re.IGNORECASE)
re.error: bad escape \j at position 0
Due to that I think grep should also ignore the non-maching/bad character
instead of printing them.
Best regards,
Tomasz Dziendzielski
pon., 18 paź 2021 o 06:08 Paul Eggert <eggert <at> cs.ucla.edu> napisał(a):
> On 10/17/21 15:15, Tomasz Dziendzielski wrote:
> > It's being printed even when re_search in EGexecute doesn't return a
> match.
>
> Gexecute should return a match, though, right? This is because the
> pattern '\a' matches the data 'a'. So it sounds like the bug is in
> EGexecute somewhere, not in its caller.
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 05:12:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Also, even if we assume it should return a match (which I think it
shouldn't), then there's a second bug in print_line_middle, because why
would it print anything if there was no match?
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 13:38:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Additional note, according to POSIX regex standard:
"An ordinary character is a BRE that matches itself: any character in the
supported character set, except for the BRE special characters listed in
BRE Special Characters.
The interpretation of an ordinary character preceded by an unescaped
<backslash> ( '\\' ) is undefined, except for: [...]"
which means that escaping ordinary characters (like \a) has undefined
behaviour, thus I think it should not be matched.
So having EGexecute match 'a' to '\a' is incorrect, and instead we should
remove the matching and make sure it's not printed. That would make my
patch correct.
Reference:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_03
Looking forward to feedback.
Best regards,
Tomasz Dziendzielski
pon., 18 paź 2021 o 07:10 Tomasz Dziendzielski <
tomasz.dziendzielski <at> gmail.com> napisał(a):
> Also, even if we assume it should return a match (which I think it
> shouldn't), then there's a second bug in print_line_middle, because why
> would it print anything if there was no match?
>
> Best regards,
> Tomasz Dziendzielski
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 15:41:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Now I noticed that it introduces inconsistency, but due to a second problem
- that grep prints lines even when there were no matching strings. For
example if I run:
$ echo "a" | grep -E "{1,3}"
$ echo "a" | grep -i "\a"
then it returns the string "a" even though it doesn't match anything. With
my patch it correctly does not print the string but I think for consistency
it also requires adaptations in case --color was not passed. I'll try to
work on that.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 16:12:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
A different inconsistency is in gnulib regex also, which matches 'a' to
'\a' but do not match the same if case insensitive is set. I'll still try
to work on that.
Best regards,
Tomasz Dziendzielski
pon., 18 paź 2021, 17:39 użytkownik Tomasz Dziendzielski <
tomasz.dziendzielski <at> gmail.com> napisał:
> Now I noticed that it introduces inconsistency, but due to a second
> problem - that grep prints lines even when there were no matching strings.
> For example if I run:
> $ echo "a" | grep -E "{1,3}"
> $ echo "a" | grep -i "\a"
> then it returns the string "a" even though it doesn't match anything. With
> my patch it correctly does not print the string but I think for consistency
> it also requires adaptations in case --color was not passed. I'll try to
> work on that.
>
> Best regards,
> Tomasz Dziendzielski
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 18 Oct 2021 17:00:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 39678 <at> debbugs.gnu.org (full text, mbox):
On 10/17/21 21:53, Tomasz Dziendzielski wrote:
> Thanks for the reply. In my opinion the bug is not in the EGexecute, since
> it uses re_search (from gnulib) and the re_search is not returning a match.
> So there is no problem in EGexecute.
It seems to me that there is a bug, because the search in question is
done to a line for which EGexecute has already reported a match.
Therefore, if you search again with the same pattern, you should find at
least one match. However, in this buggy case, EGexecute is not finding a
match, and this messes up the caller.
> even if we assume it should return a match (which I think it
> shouldn't), then there's a second bug in print_line_middle, because why
> would it print anything if there was no match?
It should be able to assume that there is at least one match, so it
shouldn't need to worry about the case where there are zero matches.
> which means that escaping ordinary characters (like \a) has undefined
> behaviour,
Yes, and that means this bug is not a POSIX-conformance issue. However,
it's still a bug, since '\a' should match 'a' in GNU grep, at least
unless and until we decide that \a matches something else.
> So having EGexecute match 'a' to '\a' is incorrect
Not at all. Undefined behavior means 'grep' can do anything it likes, as
far as POSIX is concerned.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Tue, 19 Oct 2021 18:38:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I found that for optimization purposes common cases in re_string_peek_byte
are missed if pstr->mbs_allocated is true, thus skipping them also for case
insensitive.
My solution would be to return re_string_peek_byte also if pstr->icase is
true. mbs_allocated is changed also depending on icase but I don't think we
want to change it globally, so I think my patch should be fine and not
affect optimization.
Please check the attachment. If the patch is fine should I also send this
to gnulib mailing list to have it submitted or is this one enough?
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
[0001-regex-Handle-common-easiest-cases-if-case-insensitiv.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Tue, 19 Oct 2021 23:24:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 39678 <at> debbugs.gnu.org (full text, mbox):
On 10/19/21 11:36, Tomasz Dziendzielski wrote:
> I found that for optimization purposes common cases in re_string_peek_byte
> are missed if pstr->mbs_allocated is true, thus skipping them also for case
> insensitive.
Unfortunately I've lost context here. Is this patch being proposed to
improve performance, or to fix a bug?
I don't see how the patch preserves correctness: if pstr->icase is true,
surely in general we need to look at the translated string, not the
original.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Wed, 20 Oct 2021 01:11:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> I found that for optimization purposes common cases in
re_string_peek_byte
>> are missed if pstr->mbs_allocated is true, thus skipping them also for
case
>> insensitive.
>Unfortunately I've lost context here. Is this patch being proposed to
>improve performance, or to fix a bug?
>I don't see how the patch preserves correctness: if pstr->icase is true,
>surely in general we need to look at the translated string, not the
>original.
It's to fix the bug. Without the patch if we use icase this
re_string_peek_byte was not executed and as a consequence match is not
found.
If we apply the patch grep is able to match the '\a' correctly. The patch
has nothing to do with improving performance, if was just an additional
comment from my side that the fix should not affect the performance (in my
opinion). Sorry for confusion.
Best regards,
Tomasz Dziendzielski
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Tue, 02 Nov 2021 19:26:01 GMT)
Full text and
rfc822 format available.
Message #56 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
so is my patch correct? Without my patch re_string_peek_byte() is not
executed - and this is the function that is executed (and where the match
is found) when "--ignore-case" is not enabled. With my patch grep is able
to match the '\a' while still preserving the correctness for other cases.
For sure some tests would be helpful but from what I checked my patch is
not breaking anything else while adding the correct match for '\a' case.
If you still think the patch is wrong, could you please point me in the
right direction? Maybe you could point me where it's not preserving
correctness.
Best regards,
Tomasz Dziendzielski
śr., 20 paź 2021 o 03:10 Tomasz Dziendzielski <
tomasz.dziendzielski <at> gmail.com> napisał(a):
> >> I found that for optimization purposes common cases in
> re_string_peek_byte
> >> are missed if pstr->mbs_allocated is true, thus skipping them also for
> case
> >> insensitive.
>
> >Unfortunately I've lost context here. Is this patch being proposed to
> >improve performance, or to fix a bug?
>
> >I don't see how the patch preserves correctness: if pstr->icase is true,
> >surely in general we need to look at the translated string, not the
> >original.
>
> It's to fix the bug. Without the patch if we use icase this
> re_string_peek_byte was not executed and as a consequence match is not
> found.
> If we apply the patch grep is able to match the '\a' correctly. The patch
> has nothing to do with improving performance, if was just an additional
> comment from my side that the fix should not affect the performance (in my
> opinion). Sorry for confusion.
>
> Best regards,
> Tomasz Dziendzielski
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Sat, 21 May 2022 10:06:02 GMT)
Full text and
rfc822 format available.
Message #61 received at 39678 <at> debbugs.gnu.org (full text, mbox):
Looking again at grep bug 39678 <https://bugs.gnu.org/39678> I noticed
that the bug occurs even when grep is not coloring:
echo a | grep -oi --color=never '\a'
This outputs nothing and exits with status 0, which is clearly wrong.
I tracked this down to a bug buried deep in the bowels of glibc regex, a
bug that Tomasz also spotted. It's not trivial to fix (the fix that
Tomasz sent in doesn't feel right, at least for \X where X is a
multibyte letter) and any fix would be low priority since the bug occurs
only in regular expressions like '\a' that have unspecified behavior -
which means the behavior though wrong nevertheless conforms to POSIX.
I'm inclined to address this by having GNU 'grep' diagnose unspecified
regexps like '\a' and exit with status 2, much as it already diagnoses
unspecified regexps like '[:alpha:]'. If this approach sounds too
drastic, a gentler approach would be for 'grep' to warn about '\a'
without changing the exit status for now, and escalate the warning to
exit with status 2 in a later 'grep' release.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Sat, 21 May 2022 18:42:01 GMT)
Full text and
rfc822 format available.
Message #64 received at 39678 <at> debbugs.gnu.org (full text, mbox):
On Sat, May 21, 2022 at 3:05 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Looking again at grep bug 39678 <https://bugs.gnu.org/39678> I noticed
> that the bug occurs even when grep is not coloring:
>
> echo a | grep -oi --color=never '\a'
>
> This outputs nothing and exits with status 0, which is clearly wrong.
Wrong, indeed!
> I tracked this down to a bug buried deep in the bowels of glibc regex, a
> bug that Tomasz also spotted. It's not trivial to fix (the fix that
> Tomasz sent in doesn't feel right, at least for \X where X is a
> multibyte letter) and any fix would be low priority since the bug occurs
> only in regular expressions like '\a' that have unspecified behavior -
> which means the behavior though wrong nevertheless conforms to POSIX.
>
> I'm inclined to address this by having GNU 'grep' diagnose unspecified
> regexps like '\a' and exit with status 2, much as it already diagnoses
> unspecified regexps like '[:alpha:]'. If this approach sounds too
> drastic, a gentler approach would be for 'grep' to warn about '\a'
> without changing the exit status for now, and escalate the warning to
> exit with status 2 in a later 'grep' release.
In my experience, there are many lurking uses of things like '\a', and
would like to ease into this gently, so I much prefer your latter
approach: warn now, and change grep's exit status later -- at least 6
months to a year later, to give people
at least a chance to fix their offending uses before they break.
Thanks for all the improvements!
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Sun, 22 May 2022 22:25:02 GMT)
Full text and
rfc822 format available.
Message #67 received at 39678 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 5/21/22 11:40, Jim Meyering wrote:
> In my experience, there are many lurking uses of things like '\a', and
> would like to ease into this gently, so I much prefer your latter
> approach: warn now, and change grep's exit status later
Sounds good.
When I started looking into that, I discovered that the grep manual
doesn't cover these lurkers well. And although I installed a patch
yesterday about this, after looking at the POSIX spec again today I
discovered that I'd missed quite a few lurkers. So I just now installed
the attached documentation fix, which attempts to cover all the
remaining problem regexps, and to give us room to add warnings for some
of them soon.
We shouldn't warn about all these problems, not without a --pedantic
flag or something like that (something I'm probably too busy to add).
But I expect it'd be good to warn about areas where grep's semantics
don't match any reasonable expectation.
We've already uncovered one area, where \a doesn't work as expected and
where a warning diagnostic would be helpful. Here's another one, where
an oddly-placed '*' doesn't work as one would expect:
$ printf '*\na\n*a\n' | grep '\(*\)'
*
*a
$ printf '*\na\n*a\n' | grep -E '(*)'
grep: Unmatched ( or \(
$ printf '*\na\n*a\n' | grep '\(*a\)'
*a
$ printf '*\na\n*a\n' | grep -E '(*a)'
a
*a
Although not a POSIX violation, here 'grep -E' is "wrong" for any
reasonable definition of "wrong" that I can think of. The attached patch
changes the doc to say that this regular expression has unspecified
behavior (something that POSIX allows).
(Who would have thought regular expressions were so complicated? :-)
[0001-doc-document-regex-corner-cases-better.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Mon, 23 May 2022 20:04:02 GMT)
Full text and
rfc822 format available.
Message #70 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 5/22/22 15:22, Paul Eggert wrote:
> We've already uncovered one area, where \a doesn't work as expected and
> where a warning diagnostic would be helpful.
I installed the attached patch to cause 'grep' to warn about these.
Comments welcome. Most of the changes were in Gnulib's dfa module, which
see.
> Here's another one, where an oddly-placed '*' doesn't work as one would expect:
>
> $ printf '*\na\n*a\n' | grep '\(*\)'
> *
> *a
> $ printf '*\na\n*a\n' | grep -E '(*)'
> grep: Unmatched ( or \(
> $ printf '*\na\n*a\n' | grep '\(*a\)'
> *a
> $ printf '*\na\n*a\n' | grep -E '(*a)'
> a
> *a
I plan to look at this next. We shouldn't warn about BREs like '\(*\)'
and '\(*a\)' as these conform to POSIX and work fine. But it makes sense
to warn about EREs like '(*)', '(*a)', '(+)', '(+a)', '({1})', '({1}a)'
as POSIX does not specify their behavior, their semantics are
unpredictable with GNU grep, and it's plausible that people are making
mistakes in this area.
[0001-build-update-gnulib-submodule-to-latest.patch (text/x-patch, attachment)]
[0002-grep-warn-about-stray-backslashes.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#39678
; Package
grep
.
(Wed, 25 May 2022 00:53:02 GMT)
Full text and
rfc822 format available.
Message #73 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 5/23/22 13:03, Paul Eggert wrote:
> it makes sense to warn about EREs like '(*)', '(*a)', '(+)', '(+a)',
> '({1})', '({1}a)' as POSIX does not specify their behavior, their
> semantics are unpredictable with GNU grep, and it's plausible that
> people are making mistakes in this area.
I installed the attached to do that. As before, most of the changes were
in Gnulib's dfa module.
With all these changes, we now see behavior like this:
$ echo a | src/grep -oi '\a'
src/grep: warning: stray \ before a
which is not ideal (so I'll leave the bug report open), but at least
this gives the user a warning that the pattern is not on the up-and-up.
[0001-build-update-gnulib-submodule-to-latest.patch (text/x-patch, attachment)]
[0002-grep-warn-about-x-etc.patch (text/x-patch, attachment)]
This bug report was last modified 1 year and 338 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.