GNU bug report logs - #20974
Weird newline matching behaviour in --null-data mode

Previous Next

Package: grep;

Reported by: Balazs Kezes <rlblaster <at> gmail.com>

Date: Fri, 3 Jul 2015 17:00:07 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20974 in the body.
You can then email your comments to 20974 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#20974; Package grep. (Fri, 03 Jul 2015 17:00:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Balazs Kezes <rlblaster <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 03 Jul 2015 17:00:08 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Balazs Kezes <rlblaster <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: Weird newline matching behaviour in --null-data mode
Date: Fri, 3 Jul 2015 17:59:19 +0100
Hello!

I'm running into issues with grep in -z mode. I've managed to minimize
it into this:

	$ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
	0
	$ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
	1

I'd expect the two expressions to mean the same. I've tried this with
the latest version built from the official sources, 2.21. I've also
found [1] which might be related but it wasn't updated for almost 2
years. Or is this expected?

Thanks!


[1] http://savannah.gnu.org/bugs/?40009

-- 
Balazs




Information forwarded to bug-grep <at> gnu.org:
bug#20974; Package grep. (Sat, 04 Jul 2015 00:37:02 GMT) Full text and rfc822 format available.

Message #8 received at 20974 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Balazs Kezes <rlblaster <at> gmail.com>
Cc: 20974 <at> debbugs.gnu.org
Subject: Re: bug#20974: Weird newline matching behaviour in --null-data mode
Date: Sat, 04 Jul 2015 09:36:32 +0900
On Fri, 3 Jul 2015 17:59:19 +0100
Balazs Kezes <rlblaster <at> gmail.com> wrote:

> I'm running into issues with grep in -z mode. I've managed to minimize
> it into this:
> 
> 	$ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
> 	0
> 	$ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
> 	1
> 
> I'd expect the two expressions to mean the same. I've tried this with
> the latest version built from the official sources, 2.21. I've also
> found [1] which might be related but it wasn't updated for almost 2
> years. Or is this expected?

$ seq 2 | env LC_ALL=C grep --null-data --quiet '[12].2' ; echo $?
0
$ seq 2 | env LC_ALL=C grep --null-data --quiet '[1-2].2' ; echo $?
0
$ seq 2 | env LC_ALL=en_US.iso88591 grep --null-data --quiet '[12].2' ; echo $?
0
$ seq 2 | env LC_ALL=en_US.iso88591 grep --null-data --quiet '[1-2].2' ; echo $?
1

grep depends on regex for only last case to support collating element,
but regex is not support to substitute NUL for LF as newline character
with --null-data.





Information forwarded to bug-grep <at> gnu.org:
bug#20974; Package grep. (Sat, 04 Jul 2015 03:04:02 GMT) Full text and rfc822 format available.

Message #11 received at 20974 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Balazs Kezes <rlblaster <at> gmail.com>
Cc: 20974 <at> debbugs.gnu.org
Subject: Re: bug#20974: Weird newline matching behaviour in --null-data mode
Date: Fri, 3 Jul 2015 20:03:10 -0700
On Fri, Jul 3, 2015 at 9:59 AM, Balazs Kezes <rlblaster <at> gmail.com> wrote:
> Hello!
>
> I'm running into issues with grep in -z mode. I've managed to minimize
> it into this:
>
>         $ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
>         0
>         $ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
>         1

Thank you for the report.
I too would like those two commands to work the same way.
The problem is that when the regular expression contains a
bracket expression with a range, grep switches from using
its DFA matcher to relying on regex, but as Norihiro Tanaka
mentioned, grep's use of the regex matcher with the
--null-data (-z) option cannot match multi-line results.

One can demonstrate the problem in the C locale too,
by using a back-reference, since that construct also causes
grep to use regex:

  $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '1.1'
  $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '(1).\1'
  [Exit 1]
  $ printf '1\n1\n' |LC_ALL=C src/grep -Ezq '(1).\1'
  [Exit 1]

It'd be great to fix this, but it is not on my short-term radar,
though I will add some expected-to-fail tests.




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Sat, 04 Jul 2015 03:11:02 GMT) Full text and rfc822 format available.

Notification sent to Balazs Kezes <rlblaster <at> gmail.com>:
bug acknowledged by developer. (Sat, 04 Jul 2015 03:11:03 GMT) Full text and rfc822 format available.

Message #16 received at 20974-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Balazs Kezes <rlblaster <at> gmail.com>
Cc: 20974-done <at> debbugs.gnu.org
Subject: Re: bug#20974: Weird newline matching behaviour in --null-data mode
Date: Fri, 3 Jul 2015 20:10:08 -0700
On Fri, Jul 3, 2015 at 8:03 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Fri, Jul 3, 2015 at 9:59 AM, Balazs Kezes <rlblaster <at> gmail.com> wrote:
>> Hello!
>>
>> I'm running into issues with grep in -z mode. I've managed to minimize
>> it into this:
>>
>>         $ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
>>         0
>>         $ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
>>         1
>
> Thank you for the report.
> I too would like those two commands to work the same way.
> The problem is that when the regular expression contains a
> bracket expression with a range, grep switches from using
> its DFA matcher to relying on regex, but as Norihiro Tanaka
> mentioned, grep's use of the regex matcher with the
> --null-data (-z) option cannot match multi-line results.
>
> One can demonstrate the problem in the C locale too,
> by using a back-reference, since that construct also causes
> grep to use regex:
>
>   $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '1.1'
>   $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '(1).\1'
>   [Exit 1]
>   $ printf '1\n1\n' |LC_ALL=C src/grep -Ezq '(1).\1'
>   [Exit 1]
>
> It'd be great to fix this, but it is not on my short-term radar,
> though I will add some expected-to-fail tests.

Oh, nice! I see that Paul Eggert has just fixed this with
the following patch:
  http://git.sv.gnu.org/cgit/grep.git/commit/?id=0e8fda0d880cccd0

So I'm closing this ticket.




Information forwarded to bug-grep <at> gnu.org:
bug#20974; Package grep. (Sat, 04 Jul 2015 04:42:02 GMT) Full text and rfc822 format available.

Message #19 received at 20974 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 20974 <at> debbugs.gnu.org,
 jim <at> meyering.net,
 rlblaster <at> gmail.com
Subject: Re: bug#20974: Weird newline matching behaviour in --null-data mode
Date: Sat, 04 Jul 2015 13:40:55 +0900
[Message part 1 (text/plain, inline)]
On Fri, 3 Jul 2015 20:10:08 -0700
Jim Meyering <jim <at> meyering.net> wrote:

> Oh, nice! I see that Paul Eggert has just fixed this with
> the following patch:
>   http://git.sv.gnu.org/cgit/grep.git/commit/?id=0e8fda0d880cccd0
> 
> So I'm closing this ticket.
> 

Paul's fix is very nice, I could not found it.

However, following case is not fixed yet.  Not only '.' but also hat
list (e.g. [^a]) should match newline with -z.  So we need clear
RE_HAT_LISTS_NOT_NEWLINE bit.

$ seq 2 | LC_ALL=C grep --null-data '[1-2][^a][1-2]'
1
2
$ seq 2 | LC_ALL=en_US.iso88591 grep --null-data '[1-2][^a][1-2]'

[0001-grep-z-a-now-consistently-matches-newline.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#20974; Package grep. (Sat, 04 Jul 2015 15:51:02 GMT) Full text and rfc822 format available.

Message #22 received at 20974 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 20974 <at> debbugs.gnu.org, 
 jim <at> meyering.net, rlblaster <at> gmail.com
Subject: Re: bug#20974: Weird newline matching behaviour in --null-data mode
Date: Sat, 04 Jul 2015 08:50:14 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> Not only '.' but also hat
> list (e.g. [^a]) should match newline with -z.  So we need clear
> RE_HAT_LISTS_NOT_NEWLINE bit.

Thanks for reporting that.  I also noticed some related bugs in dfa.c that 
'grep' does not exercise (so no grep test cases, alas).  Plus, it's long been 
time that we fix RE_SYNTAX_GREP and RE_SYNTAX_EGREP to match grep's actual 
behavior.  So I installed a Gnulib patch to update RE_SYNTAX_GREP and 
RE_SYNTAX_EGREP to the fixed behavior (see 
<http://lists.gnu.org/archive/html/bug-gnulib/2015-07/msg00016.html>) and 
installed grep patches to sync to gnulib and fix the other problems.

The first attached patch I installed yesterday (and you've commented on it) but 
I didn't have time to send email about it so am attaching it now.  The other 
five attached patches fix the bugs noted above.

Here's the justification for the first attached patch.  The grep documentation 
says that '.' matches any character, and this includes both NUL and LF. 
Ordinarily, LF terminates a line and so is never part of match data, but '.' 
should still match NUL.  Conversely with -z, NUL terminates a line and so is 
never part of match data, but '.' should still match LF.

[0001-grep-z-.-now-consistently-matches-newline.patch (text/x-diff, attachment)]
[0002-grep-z-x-now-consistently-matches-newline.patch (text/x-diff, attachment)]
[0003-dfa-.-and-x-now-consistently-match-newline.patch (text/x-diff, attachment)]
[0004-build-update-gnulib-submodule-to-latest.patch (text/x-diff, attachment)]
[0005-maint-ignore-gendocs_template_min.patch (text/x-diff, attachment)]
[0006-grep-use-recent-gnulib-syntax-bits.patch (text/x-diff, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 02 Aug 2015 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 262 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.