GNU bug report logs - #23234
unexpected results with charset handling in GNU grep 2.23

Previous Next

Package: grep;

Reported by: Björn JACKE <bjoern <at> j3e.de>

Date: Wed, 6 Apr 2016 20:45:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23234 in the body.
You can then email your comments to 23234 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 20:45:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Björn JACKE <bjoern <at> j3e.de>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 06 Apr 2016 20:45:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Björn JACKE <bjoern <at> j3e.de>
To: bug-grep <at> gnu.org
Subject: unexpected results with charset handling in GNU grep 2.23
Date: Wed, 6 Apr 2016 21:25:21 +0200
Hi,

this change in GNU grep 2.23 has severe consequences:

> Binary files are now less likely to generate diagnostics and more
> likely to yield text matches.  grep now reports "Binary file FOO
> matches" and suppresses further output instead of outputting a line
> containing an encoding error; hence grep can now report matching text
> before a later binary match.  Formerly, grep reported FOO to be
> binary when it found an encoding error in FOO before generating
> output for FOO, which meant it never reported both matching text and
> matching binary data; this was less useful for searching text
> containing encoding errors in non-matching lines.

I got a report that the build of the German spellcheck dictionary got broken.
It tuned out that this happened after the update to GNU grep to 2.23:

https://bugzilla.redhat.com/show_bug.cgi?id=1316359

Actually the mentioned change leaves no reliable way to grep lines out of a
any text file, which contains non-ASCII characters.

Until now it was quite save to use grep in the C locale, also for non-ASCII
text. Now after that change, the locale charmap has to match all of the
encoding of the input file.  Unfortunately the only locale that definetely
always exists for sure is the C locale. We cannot assume that any other locale
definitions exist on an unknown system. For a script, that wants to use grep,
this is a big problem now.

Let's take this example using grep 2.23:

# echo -e "test\ntäst\ntest" | iconv -f utf8 -t latin1 | LC_ALL=C grep "st" ; echo $?
test
Binary file (standard input) matches
0

There are several problems here. Someone might want to assume that the locale
definitions for en_US.ISO-8859-1 exist. Unfortunetely such an assumtion cannot
be made. Whatever locale is used - if the definition might not be there and we
will fall back to the C locale in any case then.

The result is, we get the first matching line in the example. The second
matching line with a non-ASCII character returns the text "Binary file
(standard input) matches" on stdout (which might even be a valid matching line
of the input file!) and the following matches are skipped. (Finally the return
code is 0 - as the grepping stopped quickly, a return code >1 might be desireble,
but I don't want to dive into that point right now.)


Let me draw a biger picture: Have a look at what a POSIX compliant grep is
expected to do:
http://pubs.opengroup.org/onlinepubs/009604499/utilities/grep.html

Read the description section, especially:

--snip--
By default, an input line shall be selected if any pattern, treated as an
entire basic regular expression (BRE) as described in the Base Definitions
volume of IEEE Std 1003.1-2001, Section 9.3, Basic Regular Expressions, matches
any part of the line excluding the terminating <newline>;
--snap--

That means a posix compliant grep should not try to be too smart and tell the
user that a binary file matches the search pattern (people can use "strings" if
they want). It should just output the line. From that perspective GNU grep was
not posix compliant before either, but it was not a big problem for most people
obviously. With the recent change though and the issues described above I think
a lot of scripts using (GNU) grep will get broken.

I really hope this change will be reverted as soon as possible. I would rather
prefer GNU grep to become posix compliant and not do any binary detection by
default actually.

Cheers
Björn




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 21:05:01 GMT) Full text and rfc822 format available.

Message #8 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Björn JACKE <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Wed, 6 Apr 2016 15:04:26 -0600
[Message part 1 (text/plain, inline)]
On 04/06/2016 01:25 PM, Björn JACKE wrote:
> Let's take this example using grep 2.23:
> 
> # echo -e "test\ntäst\ntest" | iconv -f utf8 -t latin1 | LC_ALL=C grep "st" ; echo $?

[As a side point, 'echo -e' is non-portable; better is to use printf.]

Hmm.  POSIX says that a file is binary if it does not end in newline, if
it contains embedded NUL, or if it contains an encoding error.  But it
also says that LC_ALL=C is _required_ to treat all 256 byte values as
valid characters (ASCII is only required to treat 7-bit characters as
valid, and may reject 8-bit bytes, but LC_ALL=C is _not_ ASCII).  This
indeed looks like a bug in current grep.git, as I can reproduce it:

$ git rev-parse HEAD
2ba6ab34da05d3aebc5e7e3dfaedb1cf3ddc5a73
$ printf "test\ntäst\ntest\n" | iconv -f utf8 -t latin1 |
   LC_ALL=C src/grep "st"
test
Binary file (standard input) matches

Looks like we don't have something quite right in claiming that 0xe4 is
not a valid character when in the single-byte C locale.

> I really hope this change will be reverted as soon as possible. I would rather
> prefer GNU grep to become posix compliant and not do any binary detection by
> default actually.

The change of treating encoding errors as binary files will NOT be
reverted, but here, you HAVE pointed out a bug where we are treating
something as binary that is NOT an encoding error (because by
definition, LC_ALL=C has no encoding errors - all 256 byte values are
characters).  So this is indeed a bug to be fixed.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 22:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Bjoern Jacke <bjoern <at> j3e.de>
To: Eric Blake <eblake <at> redhat.com>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Thu, 7 Apr 2016 00:23:53 +0200
On 06.04.2016 23:04, Eric Blake wrote:
> The change of treating encoding errors as binary files will NOT be
> reverted, but here,

hmm ... think of log files: In log files you will usually find all kind
of encodings. If a user greps for a certain error message string in a
log file he will not be able to find the errors because GNU grep will
terminate grepping as soon as the first byte which does not fit into the
locate encoding pops up. The only way would be to advice users to use
the C locale if that is the only one that will be fixed. I can't believe
that this is what you intended to achieve here.

And what about the output of "Binary file (standard input) matches" on
*stdout*? This is not distinguishable from a line that matched and
contains this text. How should a script catch this situation?

Björn




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 22:34:02 GMT) Full text and rfc822 format available.

Message #14 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Bjoern Jacke <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Wed, 6 Apr 2016 16:33:24 -0600
[Message part 1 (text/plain, inline)]
On 04/06/2016 04:23 PM, Bjoern Jacke wrote:
> On 06.04.2016 23:04, Eric Blake wrote:
>> The change of treating encoding errors as binary files will NOT be
>> reverted, but here,
> 
> hmm ... think of log files: In log files you will usually find all kind
> of encodings. If a user greps for a certain error message string in a
> log file he will not be able to find the errors because GNU grep will
> terminate grepping as soon as the first byte which does not fit into the
> locate encoding pops up.

'grep -a' is your friend.

> And what about the output of "Binary file (standard input) matches" on
> *stdout*? This is not distinguishable from a line that matched and
> contains this text. How should a script catch this situation?

That behavior complies with POSIX requirements.  Again, a script SHOULD
NOT be grepping binary files (POSIX only defines grep on text files)
without knowing the ramifications.  Meanwhile, 'grep -a' guarantees you
won't get the "Binary file" message.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 23:05:01 GMT) Full text and rfc822 format available.

Message #17 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Bjoern Jacke <bjoern <at> j3e.de>
To: Eric Blake <eblake <at> redhat.com>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Thu, 7 Apr 2016 01:04:04 +0200
On 07.04.2016 00:33, Eric Blake wrote:
> That behavior complies with POSIX requirements.

can you give a quote here? One thing which is not POSIX compliant is
that the diagnostic messages is given back on stdout.
http://pubs.opengroup.org/onlinepubs/9699919799/ says:

--snip--
LC_MESSAGES
    Determine the locale that should be used to affect the format and
contents of diagnostic messages written to standard error.
--snap--

which implies that diagnostic messages should be given back to standard
error.

> Again, a script SHOULD
> NOT be grepping binary files (POSIX only defines grep on text files)
> without knowing the ramifications.  Meanwhile, 'grep -a' guarantees you
> won't get the "Binary file" message.

if you consider grepping text files with mixed encodings as invalid use
of grep, then you should not return 0 and/or output the "Binary file
(standard input) matches" on stdout. This makes the output of GNU grep
look like a valid match.

You say "grep -a" is your friend to all the users, who want to grep log
files (cause they tend to conain mixed encodinds). Sure, -a is a
workaround to make GNU grep work as before again. Realisically 99.99 of
the users will not know that though, because this is the first grep
version ever I guess, that requires this. Also -a is a GNU option only,
so portable scripts will not be able to use that.

I guess you are aware, that you will break a lot of existing scripts
with that change of treating mixed encoding input files as binary like
the way you do it now with GNU grep >= 2.23 ?

Björn




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Wed, 06 Apr 2016 23:16:01 GMT) Full text and rfc822 format available.

Message #20 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Bjoern Jacke <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Wed, 6 Apr 2016 17:15:25 -0600
[Message part 1 (text/plain, inline)]
On 04/06/2016 05:04 PM, Bjoern Jacke wrote:
> On 07.04.2016 00:33, Eric Blake wrote:
>> That behavior complies with POSIX requirements.
> 
> can you give a quote here? One thing which is not POSIX compliant is
> that the diagnostic messages is given back on stdout.
> http://pubs.opengroup.org/onlinepubs/9699919799/ says:
> 
> --snip--
> LC_MESSAGES
>     Determine the locale that should be used to affect the format and
> contents of diagnostic messages written to standard error.
> --snap--

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html

STDIN

    The standard input shall be used if no file operands are specified,
and shall be used if a file operand is '-' and the implementation treats
the '-' as meaning standard input. Otherwise, the standard input shall
not be used. See the INPUT FILES section.

INPUT FILES

    The input files shall be text files.

As soon as you supply grep with non-text-file input, POSIX no longer
applies, and we can do WHATEVER WE WANT.  The violation is not in grep's
behavior, but in yours for passing a binary file.

We have chosen that WHATEVER WE WANT means that by default, we will tell
you (on stdout) that the binary file matches, but if you use the
(non-standard extension) -a option, we will pretend the file is text
anyways.  And it's been documented that way for basically "forever" in
GNU grep.

What's changed recently is what we've done under the hood (more
efficient recognition of binary files, treating '\0' and '\n'
identically as line terminators when -a is not in effect because of the
speed improvements it lets us gain, and attempts with heuristics to
avoid spamming terminals or downstream clients with encoding errors when
-a is not in effect).  But all of those still fall under the broad
category of WHATEVER WE WANT as it falls outside the POSIX standard.

And yes, maybe we could change grep to print the "Binary file matches"
message to stderr, but that in turn will probably break other scripts,
and lead to even more complaints from people doing non-standard things
and expecting consistent results.  That said, patches are still welcome,
if you think you have better heuristics than what we currently have, and
as long as it still falls within the realm of WHATEVER WE WANT.

> if you consider grepping text files with mixed encodings as invalid use
> of grep, then you should not return 0 and/or output the "Binary file
> (standard input) matches" on stdout. This makes the output of GNU grep
> look like a valid match.

Maybe changing the exit status when a binary file is encountered is
worth doing - but not returning status 0 when a match is detected is
more likely to do harm than good.

> 
> You say "grep -a" is your friend to all the users, who want to grep log
> files (cause they tend to conain mixed encodinds). Sure, -a is a
> workaround to make GNU grep work as before again. Realisically 99.99 of
> the users will not know that though, because this is the first grep
> version ever I guess, that requires this. Also -a is a GNU option only,
> so portable scripts will not be able to use that.

Portable scripts are not able to grep binary files, period.  As long as
you don't mind non-portable extensions, 'grep -a' is what you want.

> 
> I guess you are aware, that you will break a lot of existing scripts
> with that change of treating mixed encoding input files as binary like
> the way you do it now with GNU grep >= 2.23 ?

Yes, we are aware that lots of users are getting an education on the
subtleties of POSIX.  But that doesn't mean it is a bug.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Thu, 07 Apr 2016 01:26:02 GMT) Full text and rfc822 format available.

Message #23 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, Bjoern Jacke <bjoern <at> j3e.de>,
 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Wed, 6 Apr 2016 18:25:16 -0700
On 04/06/2016 04:15 PM, Eric Blake wrote:
> And yes, maybe we could change grep to print the "Binary file matches"
> message to stderr, but that in turn will probably break other scripts,
> and lead to even more complaints from people doing non-standard things
> and expecting consistent results.

Yes, I'm dubious about this idea. grep's behavior was inspired by diff's 
similar behavior, and grep and diff have worked that way for many years 
and I expect people depend on it. POSIX says that diff should output its 
binary-file message to stdout, and I expect that if POSIX standardized 
grep's behavior on binary files it would do something similar.

> Maybe changing the exit status when a binary file is encountered is
> worth doing

Possibly, though I don't see the use case yet. If it's needed I suggest 
doing the change only if a new option is specified 
(--binary-files=error, say) so that it's upward-compatible with existing 
behavior.




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Thu, 07 Apr 2016 01:29:01 GMT) Full text and rfc822 format available.

Message #26 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, Björn JACKE
 <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Wed, 6 Apr 2016 18:28:33 -0700
On 04/06/2016 02:04 PM, Eric Blake wrote:
> POSIX ... says that LC_ALL=C is _required_ to treat all 256 byte 
> values as valid characters

Although that was the intent of POSIX, it's not what the current 
standard says, and it's not what many popular platforms do. Problematic 
platforms include Fedora 23, where mbrtowc reports an encoding error in 
the C locale when given a byte outside the range 0-127. This affects 
many programs other than 'grep'.

This bug in the standard is intended to be fixed in a future version of 
POSIX (see <http://austingroupbugs.net/view.php?id=663#c2738>). I 
suppose glibc and eventually Fedora will be fixed to conform to the new 
standard in due course.

Perhaps grep should work around this problem on systems like Fedora 23 
where the underlying C library does not conform to the next version of 
POSIX. It sounds like a new gnulib module or two might do the trick. 
This should fix the problems that Björn mentions.

In the meantime grep -a is the way to go. Yes, it's not portable to 
non-GNU grep, but there is no portable solution given the abovementioned 
POSIX problems, so a GNU-grep-only workaround is all one can reasonably 
ask for.




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sat, 09 Apr 2016 08:35:02 GMT) Full text and rfc822 format available.

Message #29 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, Björn JACKE
 <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sat, 9 Apr 2016 01:34:26 -0700
Paul Eggert wrote:
> Perhaps grep should work around this problem on systems like Fedora 23 where the
> underlying C library does not conform to the next version of POSIX. It sounds
> like a new gnulib module or two might do the trick. This should fix the problems
> that Björn mentions.

I've started on this by changing the mbrtowc module in gnulib to work around the 
future-POSIX incompatibility of mbrtowc in glibc. See:

http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f

I plan to change GNU grep to use this new facility, and to add some grep test 
cases for this issue.




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sat, 09 Apr 2016 09:30:02 GMT) Full text and rfc822 format available.

Message #32 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, Eric Blake <eblake <at> redhat.com>,
 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sat, 09 Apr 2016 18:29:01 +0900
On Wed, 6 Apr 2016 18:25:16 -0700
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> On 04/06/2016 04:15 PM, Eric Blake wrote:
> > And yes, maybe we could change grep to print the "Binary file matches"
> > message to stderr, but that in turn will probably break other scripts,
> > and lead to even more complaints from people doing non-standard things
> > and expecting consistent results.
> 
> Yes, I'm dubious about this idea. grep's behavior was inspired by
> diff's similar behavior, and grep and diff have worked that way for
> many years and I expect people depend on it. POSIX says that diff
> should output its binary-file message to stdout, and I expect that if
> POSIX standardized grep's behavior on binary files it would do
> something similar.

Hmm, diff does not output "Binary file matches" between text files, but
grep does it.

$ cp src/grep grep.bin
$ LC_ALL=en_US.utf8 src/grep g grep.bin
Binary file grep.bin matches
$ cat >grep.bin <<EOF
Binary file grep.bin matches
EOF
$ LC_ALL=en_US.utf8 src/grep g grep.bin
Binary file grep.bin matches

When a user got "Binary file matches" from grep, he can distinguish
whether matched a binary file or a line including "Binary file matches"
of a text file from only this result.





Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 03:08:02 GMT) Full text and rfc822 format available.

Message #35 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, Eric Blake <eblake <at> redhat.com>,
 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sat, 9 Apr 2016 20:07:18 -0700
Norihiro Tanaka wrote:
> Hmm, diff does not output "Binary file matches" between text files, but
> grep does it.

I wasn't referring to the exact string "Binary file matches", merely to the idea 
that diff outputs a message to stdout saying that there was a binary file, 
rather than to stderr. Something like this:

$ diff /usr/bin/diff /usr/bin/emacs 2>/dev/null
Binary files /usr/bin/diff and /usr/bin/emacs differ

> When a user got "Binary file matches" from grep, he can distinguish
> whether matched a binary file or a line including "Binary file matches"
> of a text file from only this result.

Although that's a problem, it's not a serious one, as one can easily work around 
it by using -n or -H. If there were need, I suppose we could add another operand 
to the --binary option to cause it to do something else with a match.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 10 Apr 2016 08:44:01 GMT) Full text and rfc822 format available.

Notification sent to Björn JACKE <bjoern <at> j3e.de>:
bug acknowledged by developer. (Sun, 10 Apr 2016 08:44:01 GMT) Full text and rfc822 format available.

Message #40 received at 23234-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, Björn JACKE
 <bjoern <at> j3e.de>, 23234-done <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 01:43:10 -0700
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> I plan to change GNU grep to use this new facility, and to add some grep test
> cases for this issue.

I did that by installing the attached patches into the grep master. This fixes 
the bug for me, so I'm closing the bug report.

These patches mostly just report the fix and add test cases. The actual fix was 
in gnulib, here:

http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f

This gnulib fix works around the underyling glibc facility which caused the 
problem, for which I've filed a bug report here:

https://sourceware.org/bugzilla/show_bug.cgi?id=19932

It's not clear when the glibc bug will be fixed. Until it is, one should expect 
similar problems to crop up in applications other than 'grep'.
[0001-build-update-gnulib-submodule-to-latest.txt (text/plain, attachment)]
[0002-grep-in-C-locale-all-bytes-are-valid-characters.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 21:11:02 GMT) Full text and rfc822 format available.

Message #43 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 23234 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 Bjoern Jacke <bjoern <at> j3e.de>
Cc: 23234-done <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 14:10:08 -0700
[Message part 1 (text/plain, inline)]
On Sun, Apr 10, 2016 at 1:43 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Paul Eggert wrote:
>>
>> I plan to change GNU grep to use this new facility, and to add some grep
>> test
>> cases for this issue.
>
>
> I did that by installing the attached patches into the grep master. This
> fixes the bug for me, so I'm closing the bug report.
>
> These patches mostly just report the fix and add test cases. The actual fix
> was in gnulib, here:
>
> http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f
>
> This gnulib fix works around the underyling glibc facility which caused the
> problem, for which I've filed a bug report here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=19932
>
> It's not clear when the glibc bug will be fixed. Until it is, one should
> expect similar problems to crop up in applications other than 'grep'.

Thanks for the fine work, Paul.
With this fix, I would like to make yet another grep release.
Does anyone have any pending changes we should consider?

Incidentally, looking at mbrtowc uses, I found an unused
function and removed it with this patch:
[0001-maint-remove-unused-mbtoupper-function.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 21:11:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 22:00:02 GMT) Full text and rfc822 format available.

Message #49 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Zev Weiss <zev <at> bewilderbeest.net>
To: Jim Meyering <jim <at> meyering.net>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, 23234-done <at> debbugs.gnu.org,
 Paul Eggert <eggert <at> cs.ucla.edu>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 16:59:08 -0500
On Sun, Apr 10, 2016 at 02:10:08PM -0700, Jim Meyering wrote:
>On Sun, Apr 10, 2016 at 1:43 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>> Paul Eggert wrote:
>>>
>>> I plan to change GNU grep to use this new facility, and to add some grep
>>> test
>>> cases for this issue.
>>
>>
>> I did that by installing the attached patches into the grep master. This
>> fixes the bug for me, so I'm closing the bug report.
>>
>> These patches mostly just report the fix and add test cases. The actual fix
>> was in gnulib, here:
>>
>> http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b7bc3c1a4e78add4cbad39ae1a0c4fb0747b483f
>>
>> This gnulib fix works around the underyling glibc facility which caused the
>> problem, for which I've filed a bug report here:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=19932
>>
>> It's not clear when the glibc bug will be fixed. Until it is, one should
>> expect similar problems to crop up in applications other than 'grep'.
>
>Thanks for the fine work, Paul.
>With this fix, I would like to make yet another grep release.
>Does anyone have any pending changes we should consider?
>
>Incidentally, looking at mbrtowc uses, I found an unused
>function and removed it with this patch:

Well, I still have my multithreading patch series 
(https://github.com/zevweiss/grep/) awaiting review, which I'd hope to 
get applied at some point, though I'd guess it's enough of a review task 
that delaying an impending release for it isn't likely (the 
mbtoupper()-removal patch made that series one patch shorter though, 
since one was to deal with that function's thread-unsafety).  I've been 
rebasing it periodically and running it on my own system in /usr/local 
without any problems for a while now, for what that's worth.

With current HEAD from savannah though, all check-very-expensive tests 
pass for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux kernel 
4.3.


Zev Weiss





Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 22:00:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 22:10:01 GMT) Full text and rfc822 format available.

Message #55 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Zev Weiss <zev <at> bewilderbeest.net>, Jim Meyering <jim <at> meyering.net>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, 23234-done <at> debbugs.gnu.org,
 20768 <at> debbugs.gnu.org, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 15:09:17 -0700
On 04/10/2016 02:59 PM, Zev Weiss wrote:
> I still have my multithreading patch series 
> (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to 
> get applied at some point, though I'd guess it's enough of a review 
> task that delaying an impending release for it isn't likely (the 
> mbtoupper()-removal patch made that series one patch shorter though, 
> since one was to deal with that function's thread-unsafety).  I've 
> been rebasing it periodically and running it on my own system in 
> /usr/local without any problems for a while now, for what that's worth.
>
> With current HEAD from savannah though, all check-very-expensive tests 
> pass for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux 
> kernel 4.3.

Thanks for pinging us about this. Sorry, I kind of dropped the ball on 
this one. I will try to bump its priority. There are some other 
long-pending patches that also need review. I agree that these shouldn't 
delay the next release, but perhaps it could delay the release after 
that....




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Sun, 10 Apr 2016 22:10:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Mon, 11 Apr 2016 04:47:02 GMT) Full text and rfc822 format available.

Message #61 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Zev Weiss <zev <at> bewilderbeest.net>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, 23234-done <at> debbugs.gnu.org,
 Paul Eggert <eggert <at> cs.ucla.edu>, 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 21:46:06 -0700
On Sun, Apr 10, 2016 at 2:59 PM, Zev Weiss <zev <at> bewilderbeest.net> wrote:
...
> Well, I still have my multithreading patch series
> (https://github.com/zevweiss/grep/) awaiting review, which I'd hope to get
> applied at some point, though I'd guess it's enough of a review task that
> delaying an impending release for it isn't likely (the mbtoupper()-removal
> patch made that series one patch shorter though, since one was to deal with
> that function's thread-unsafety).  I've been rebasing it periodically and
> running it on my own system in /usr/local without any problems for a while
> now, for what that's worth.
>
> With current HEAD from savannah though, all check-very-expensive tests pass
> for me on Debian stretch with gcc 5.3, glibc 2.22, and Linux kernel 4.3.

Thanks for your patience.
Definitely a worthwhile feature. You're right: I want to ensure
the core functionality is in a very solid state before making a
release including multithreading.




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Mon, 11 Apr 2016 04:47:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Mon, 11 Apr 2016 05:20:01 GMT) Full text and rfc822 format available.

Message #67 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, 23234 <at> debbugs.gnu.org,
 Bjoern Jacke <bjoern <at> j3e.de>
Cc: Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Sun, 10 Apr 2016 22:19:34 -0700
Jim Meyering wrote:
> With this fix, I would like to make yet another grep release.
> Does anyone have any pending changes we should consider?

I did a bit of bug triage, closing some bug reports and fixing one minor 
documentation bug (Bug#22911). I didn't see any pending changes that can't wait 
until after the next release.

Thanks for volunteering to make these releases.




Information forwarded to bug-grep <at> gnu.org:
bug#23234; Package grep. (Mon, 11 Apr 2016 15:36:02 GMT) Full text and rfc822 format available.

Message #70 received at 23234 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, Eric Blake <eblake <at> redhat.com>,
 23234 <at> debbugs.gnu.org
Subject: Re: bug#23234: unexpected results with charset handling in GNU grep
 2.23
Date: Mon, 11 Apr 2016 08:34:45 -0700
On Sun, Apr 10, 2016 at 10:19 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> With this fix, I would like to make yet another grep release.
>> Does anyone have any pending changes we should consider?
>
>
> I did a bit of bug triage, closing some bug reports and fixing one minor
> documentation bug (Bug#22911). I didn't see any pending changes that can't
> wait until after the next release.

Thank *you*.

> Thanks for volunteering to make these releases.

It's the least I can do, when you're fixing so many bugs.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 10 May 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 323 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.