GNU bug report logs - #19230
Help! grepV2.21 treats ISO-8859 text files as if they are binary

Previous Next

Package: grep;

Reported by: Hans Pelleboer <hanspelleboer <at> online.nl>

Date: Sun, 30 Nov 2014 19:10:02 UTC

Severity: normal

Merged with 19985, 20526, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19230 in the body.
You can then email your comments to 19230 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Sun, 30 Nov 2014 19:10:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Hans Pelleboer <hanspelleboer <at> online.nl>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 30 Nov 2014 19:10:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Hans Pelleboer <hanspelleboer <at> online.nl>
To: bug-grep <at> gnu.org
Subject: Help! grepV2.21 treats ISO-8859 text files as if they are binary
Date: Sun, 30 Nov 2014 14:40:31 +0100
Hello,

After upgrading to V2.21 I discovered that certain textfiles did not
yield any output after a grep search but the single line:

Binary file <NAME_FILE> matches

Further tests showed, that grep only behaved this way with text
files that were encoded according to ISO-8859 (There may be more!).
The presence or absence of diacritical characters didn't matter, nor
variants of formatting; another file with DOS type <CR><LF>'s parsed
just the same.

The only way to overcome this was by using the `roughshod' option
of the -a flag.

None of grep's versions that I have used before ever showed
this peculiarity.

I hope you can shed some light on this issue,

hansp






Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Sun, 30 Nov 2014 22:03:02 GMT) Full text and rfc822 format available.

Message #8 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Hans Pelleboer <hanspelleboer <at> online.nl>, 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Sun, 30 Nov 2014 14:02:13 -0800
Hans Pelleboer wrote:

> Binary file <NAME_FILE> matches
>
> Further tests showed, that grep only behaved this way with text
> files that were encoded according to ISO-8859 (There may be more!).

What operating system are you running on, and how did you build or import grep?

Also, what's your locale?  What is the output of the shell command 'locale'?

I can see this happening if you are using an UTF-8 locale, as in general 
ISO-8859 is not valid UTF-8 text.  Older versions of 'grep' were less picky in 
this area, and that might explain the symptoms you observed.  With newer 
versions it's more important for the locale to be compatible with the text 
file's encoding.




Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Mon, 01 Dec 2014 07:58:02 GMT) Full text and rfc822 format available.

Message #11 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Hans Pelleboer <hanspelleboer <at> online.nl>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Mon, 01 Dec 2014 08:57:52 +0100
I think you nailed it, Paul:

OS: Arch Linux / kernel 3.17.4 / x86_64, locale is set to UTF-8
grep came straight from the Arch repository.

As grepV2.20 still showed the `old', more forgiving behaviour,
I was wondering what can be done to compile grep in such a way,
that it processes all text files, no matter what way they are encoded.
After all sed, vi, emacs, the works, do just that.

Yours,

hansp

On 11/30/2014 11:02 PM, Paul Eggert wrote:
> Hans Pelleboer wrote:
>
>> Binary file <NAME_FILE> matches
>>
>> Further tests showed, that grep only behaved this way with text
>> files that were encoded according to ISO-8859 (There may be more!).
>
> What operating system are you running on, and how did you build or 
> import grep?
>
> Also, what's your locale?  What is the output of the shell command 
> 'locale'?
>
> I can see this happening if you are using an UTF-8 locale, as in 
> general ISO-8859 is not valid UTF-8 text.  Older versions of 'grep' 
> were less picky in this area, and that might explain the symptoms you 
> observed.  With newer versions it's more important for the locale to 
> be compatible with the text file's encoding.





Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Mon, 01 Dec 2014 18:53:02 GMT) Full text and rfc822 format available.

Message #14 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Hans Pelleboer <hanspelleboer <at> online.nl>, 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Mon, 01 Dec 2014 10:52:33 -0800
On 11/30/2014 11:57 PM, Hans Pelleboer wrote:
> I was wondering what can be done to compile grep in such a way,
> that it processes all text files, no matter what way they are encoded. 

You don't need to recompile; you can just use 'grep -a'.




Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Mon, 01 Dec 2014 18:55:02 GMT) Full text and rfc822 format available.

Message #17 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Hans Pelleboer <hanspelleboer <at> online.nl>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Mon, 01 Dec 2014 19:54:13 +0100
On 12/01/2014 07:52 PM, Paul Eggert wrote:
> On 11/30/2014 11:57 PM, Hans Pelleboer wrote:
>> I was wondering what can be done to compile grep in such a way,
>> that it processes all text files, no matter what way they are encoded. 
>
> You don't need to recompile; you can just use 'grep -a'.
I was referring to the default behaviour, as it used to be, without 
extra flags




Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Mon, 01 Dec 2014 19:46:01 GMT) Full text and rfc822 format available.

Message #20 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Hans Pelleboer <hanspelleboer <at> online.nl>
Cc: 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Mon, 1 Dec 2014 12:45:14 -0700
Hans Pelleboer wrote:
> Paul Eggert wrote:
> >Hans Pelleboer wrote:
> > > I was wondering what can be done to compile grep in such a way,
> > > that it processes all text files, no matter what way they are encoded.
> >
> >You don't need to recompile; you can just use 'grep -a'.
>
> I was referring to the default behaviour, as it used to be, without extra
> flags

I apologize for having no time to check myself.  What is the behavior
if you set the locale back to C/POSIX for grep?

  env LC_ALL=C grep someencodedfile

Bob




Information forwarded to bug-grep <at> gnu.org:
bug#19230; Package grep. (Mon, 01 Dec 2014 19:55:02 GMT) Full text and rfc822 format available.

Message #23 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Hans Pelleboer <hanspelleboer <at> online.nl>
To: 19230 <at> debbugs.gnu.org
Subject: Fwd: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as
 if they are binary
Date: Mon, 01 Dec 2014 20:54:46 +0100
[Message part 1 (text/plain, inline)]


-------- Forwarded Message --------
Subject: 	Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as 
if they are binary
Date: 	Mon, 01 Dec 2014 20:50:07 +0100
From: 	Hans Pelleboer <hanspelleboer <at> online.nl>
To: 	Bob Proulx <bob <at> proulx.com>



On 12/01/2014 08:45 PM, Bob Proulx wrote:
> Hans Pelleboer wrote:
>> Paul Eggert wrote:
>>> Hans Pelleboer wrote:
>>>> I was wondering what can be done to compile grep in such a way,
>>>> that it processes all text files, no matter what way they are encoded.
>>> You don't need to recompile; you can just use 'grep -a'.
>> I was referring to the default behaviour, as it used to be, without extra
>> flags
> I apologize for having no time to check myself.  What is the behavior
> if you set the locale back to C/POSIX for grep?
>
>    env LC_ALL=C grep someencodedfile
>
> Bob
Then everything is hunky dory as it used to be with V2.20 and before
(since 1986 in my case!).
Hans



[Message part 2 (text/html, inline)]

Merged 19230 19985 20526. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sat, 30 May 2015 20:05:06 GMT) Full text and rfc822 format available.

Merged 19230 19985 20526 21558. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Fri, 25 Sep 2015 18:05:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 06 Feb 2016 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 103 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.