GNU bug report logs -
#16871
problems about matching newline (with -z)
Previous Next
To reply to this bug, email your comments to 16871 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#16871
; Package
grep
.
(Tue, 25 Feb 2014 07:33:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Stephane Chazelas <stephane.chazelas <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Tue, 25 Feb 2014 07:33:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
The doc has a confusing statement:
> 15. How can I match across lines?
>
> Standard grep cannot do this, as it is fundamentally line-based.
> Therefore, merely using the '[:space:]' character class does not
> match newlines in the way you might expect. However, if your grep
> is compiled with Perl patterns enabled, the Perl 's' modifier
> (which makes '.' match newlines) can be used:
>
> printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
>
> With the GNU 'grep' option '-z' (*note File and Directory
> Selection::), the input is terminated by null bytes. Thus, you can
> match newlines in the input, but the output will be the whole file,
> so this is really only useful to determine if the pattern is
> present:
>
> printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
>
> Failing either of those options, you need to transform the input
> before giving it to 'grep', or turn to 'awk', 'sed', 'perl', or
> many other utilities that are designed to operate across lines.
printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
Will never match as it's line-based even with -P. -P doesn't
help here, it makes it harder as you need that (?s).
printf 'foo\nbar\n\0' | grep -z 'foo.*bar'
would match.
Same confusion in tests/pcre:
> #! /bin/sh
> # Ensure that with -P, \s*$ matches a newline.
> #
> # Copyright (C) 2001, 2006, 2009-2014 Free Software Foundation, Inc.
> #
> # Copying and distribution of this file, with or without modification,
> # are permitted in any medium without royalty provided the copyright
> # notice and this notice are preserved.
>
> . "${srcdir=.}/init.sh"; path_prepend_ ../src
> require_pcre_
>
> fail=0
>
> # See CVS revision 1.32 of "src/search.c".
> echo | grep -P '\s*$' || fail=1
>
> Exit $fail
'\s*$' doesn't match a newline, but an empty string.
You need echo | grep -zP '\s' to match the newline.
Also:
We can match a newline with grep -zP 'a\nb' (or '\x0a' or '\012'
or '[\n]'...) but not easily without -P. Same for NUL
characters.
Without -P, the only way I could think of was with
[^\0-\011\013-\377], but that would only work for single-byte
locales, and you can't pass a nul character on the command line,
so it would have to be with -f but:
$ printf 'a\nb\0' | LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^\0-\011\013-\377]b')
zsh: done printf 'a\nb\0' |
zsh: segmentation fault LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^\0-\011\013-\377]b')
Having said that:
grep -z $'a[^\01-\011\013-\0377]b'
would work (in single-byte locales) since nul is not in the
input since it's the delimiter.
and grep -a $'[^\01-\0377]' can match nul (in single-byte
locales).
But it would be handly to be able to do the same as with -P.
--
Stephane
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16871
; Package
grep
.
(Tue, 25 Feb 2014 11:34:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 16871 <at> debbugs.gnu.org (full text, mbox):
Also:
$ printf 'a\nb\0' | grep -z 'a$'
$ printf 'a\nb\0' | grep -zP 'a$'
a
b
$ printf 'a\nb\0' | grep -zxP a
a
b
Why use PCRE_MULTILINE here?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16871
; Package
grep
.
(Fri, 25 Apr 2014 04:28:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 16871 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Stephane Chazelas wrote:
> The doc has a confusing statement ... Same confusion in tests/pcre:
Thanks, I installed the attached patch to fix those.
> We can match a newline with grep -zP 'a\nb' (or '\x0a' or '\012'
> or '[\n]'...) but not easily without -P. Same for NUL
> characters.
Yes, that's a downside of the POSIX notation, and it'd be nice to extend
POSIX to allow easy matching for newlines and/or null bytes. I'll mark
this bug report as a wishlist bug.
[0001-misc-fix-doc-and-test-bugs-re-grep-z.patch (text/plain, attachment)]
Severity set to 'wishlist' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Fri, 25 Apr 2014 04:29:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16871
; Package
grep
.
(Fri, 18 Nov 2016 17:41:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 16871 <at> debbugs.gnu.org (full text, mbox):
For the record, the doc/test confusion was fixed by commit
b73296ace186451b096b075461634c153d1fa525
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b73296ace186451b096b075461634c153d1fa525
See also https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22655#47
and below about PCRE_MULTILINE.
This bug report was last modified 7 years and 185 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.