GNU bug report logs - #19388
grep 2.21-1 identifies iso encoded text files as binary

Previous Next

Package: grep;

Reported by: Martin Hoch <hoch <at> fidion.de>

Date: Mon, 15 Dec 2014 16:49:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19388 in the body.
You can then email your comments to 19388 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#19388; Package grep. (Mon, 15 Dec 2014 16:49:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Martin Hoch <hoch <at> fidion.de>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 15 Dec 2014 16:49:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin Hoch <hoch <at> fidion.de>
To: bug-grep <at> gnu.org
Subject: grep 2.21-1 identifies iso encoded text files as binary
Date: Mon, 15 Dec 2014 15:22:00 +0100
Hi,

I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
LC_ALL is set to en_US.UTF.

I am not sure if this is a bug or an expected behaviour change in 2.21-1, but
since I could not find anything in the changelog that directly mentions it, I am
reporting it. (I could not find anything on http://debbugs.gnu.org)

How to reproduce:

Create a iso-8859-15 encoded test file with: test ä ö ü

export LC_ALL=en_US.UTF8

grep test testfile

Binary file test matches

export LC_ALL=en_US

(grep works as expected)

The behaviour for LC_ALL=en_US.UTF8 was changed in 2.21-1 and worked correctly
in 2.20-1.

I am testing this on arch with glibc 2.20-4 (if that is relevant).

Please let me know if you need more informations.

Regards,

    Martin

--
Martin Hoch                        Friedrich-Bergius-Ring 15
fidion GmbH                                   97076 Würzburg




Information forwarded to bug-grep <at> gnu.org:
bug#19388; Package grep. (Mon, 15 Dec 2014 16:50:02 GMT) Full text and rfc822 format available.

Message #8 received at 19388 <at> debbugs.gnu.org (full text, mbox):

From: Martin Hoch <hoch <at> fidion.de>
To: 19388 <at> debbugs.gnu.org
Subject: Re: bug#19388: Acknowledgement (grep 2.21-1 identifies iso encoded
 text files as binary)
Date: 15 Dec 2014 16:49:04 -0000
GNU bug Tracking System writes:

> Thank you for filing a new bug report with debbugs.gnu.org.
>
> This is an automatically generated reply to let you know your message
> has been received.
>
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due course.
>
> Your message has been sent to the package maintainer(s):
>  bug-grep <at> gnu.org
>
> If you wish to submit further information on this problem, please
> send it to 19388 <at> debbugs.gnu.org.
>
> Please do not send mail to help-debbugs <at> gnu.org unless you wish
> to report a problem with the Bug-tracking system.
>
> --
> 19388: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19388
> GNU Bug Tracking System
> Contact help-debbugs <at> gnu.org with problems

Danke fuer Ihre E-Mail. Ich bin aktuell erkrankt. Ihre E-Mail
wird nicht weiter geleitet. Wenden Sie sich in dringenden Faellen bitte an
support <at> fidion.de.






Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 16 Dec 2014 07:13:02 GMT) Full text and rfc822 format available.

Notification sent to Martin Hoch <hoch <at> fidion.de>:
bug acknowledged by developer. (Tue, 16 Dec 2014 07:13:02 GMT) Full text and rfc822 format available.

Message #13 received at 19388-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Martin Hoch <hoch <at> fidion.de>, 19388-done <at> debbugs.gnu.org
Subject: Re: bug#19388: grep 2.21-1 identifies iso encoded text files as binary
Date: Mon, 15 Dec 2014 23:12:10 -0800
[Message part 1 (text/plain, inline)]
Martin Hoch wrote:
> I noticed that grep 2.21-1 regards ISO-8859-15 encoded files as binary, if
> LC_ALL is set to en_US.UTF.
>
> I am not sure if this is a bug or an expected behaviour change in 2.21-1

It's an expected change.  Although this was documented in NEWS:

  If a file contains data improperly encoded for the current locale,
  and this is discovered before any of the file's contents are output,
  grep now treats the file as binary.

the grep manual is not so clear about it.  I installed the attached patch to try 
to fix that.
[0001-doc-document-binary-data-heuristic-better.patch (text/x-diff, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 13 Jan 2015 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 77 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.