GNU bug report logs - #23844
broken: randomly detexts textf files as binary

Previous Next

Package: grep;

Reported by: php fan <php4fan <at> gmail.com>

Date: Fri, 24 Jun 2016 22:22:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23844 in the body.
You can then email your comments to 23844 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Fri, 24 Jun 2016 22:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to php fan <php4fan <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 24 Jun 2016 22:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: php fan <php4fan <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: broken: randomly detexts textf files as binary
Date: Sat, 25 Jun 2016 00:09:09 +0200
[Message part 1 (text/plain, inline)]
What is the status of this bug??

https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458

This is a critical issue that renders grep completely unusable.
It was said to be fixed upstream but then they say it regressed again...
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Sat, 25 Jun 2016 07:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: php fan <php4fan <at> gmail.com>, 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: broken: randomly detexts textf files as binary
Date: Sat, 25 Jun 2016 09:43:15 +0200
On 06/25/2016 12:09 AM, php fan wrote:
> What is the status of this bug??
>
> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458
>

You can use either 'grep -a FOO' or 'LC_ALL=C grep FOO' to search 
through arbitrarily encoded files. You may get a 'binary files' message 
if your locale specifies some other encoding (e.g., UTF-8) and if grep 
would otherwise output a line with an encoding error, but that's not a 
bad thing to do, as encoding errors are not text.




Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Sat, 25 Jun 2016 08:34:02 GMT) Full text and rfc822 format available.

Message #11 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: php fan <php4fan <at> gmail.com>
Cc: 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: broken: randomly detexts textf files as binary
Date: Sat, 25 Jun 2016 11:12:47 +0300
Hi php fan,

On Sat, 25 Jun 2016 00:09:09 +0200
php fan <php4fan <at> gmail.com> wrote:

> What is the status of this bug??
> 
> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458
> 

I think I ran into a similar problem when trying to search through HexChat
logs. Anyway, in the meanwhile, you can try passing the --ascii flag, or using
one of http://beyondgrep.com/ or http://beyondgrep.com/more-tools/ .

Regards,

	Shlomi Fish

> This is a critical issue that renders grep completely unusable.
> It was said to be fixed upstream but then they say it regressed again...



-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
http://youtu.be/xZLwtc9x4yA - Anime in Real Life!! (Parody)

Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and wrong.
    — http://en.wikiquote.org/wiki/H._L._Mencken

Please reply to list if it's a mailing list post - http://shlom.in/reply .




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 08 Sep 2016 05:51:02 GMT) Full text and rfc822 format available.

Notification sent to php fan <php4fan <at> gmail.com>:
bug acknowledged by developer. (Thu, 08 Sep 2016 05:51:02 GMT) Full text and rfc822 format available.

Message #16 received at 23844-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: php fan <php4fan <at> gmail.com>, 23844-done <at> debbugs.gnu.org
Cc: Shlomi Fish <shlomif <at> shlomifish.org>
Subject: Re: bug#23844: broken: randomly detexts textf files as binary
Date: Wed, 7 Sep 2016 22:50:28 -0700
php fan wrote:
> What is the status of this bug??
>
> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458

No further comment at http://bugs.gnu.org/23844 so I am closing Bug#23844.

I just now re-read https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458 
and it's not clear to me what bug is being reported downstream. There are 
several messages saying it's fixed or it's not, and no test cases. If you can 
clarify this by supplying a complete self-contained test case we can reopen 
Bug#23844.




Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Thu, 08 Sep 2016 20:30:03 GMT) Full text and rfc822 format available.

Message #19 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Matteo Sisti Sette <matteosistisette <at> gmail.com>
To: 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: closed (Re: bug#23844: broken: randomly detexts textf
 files as binary)
Date: Thu, 8 Sep 2016 22:17:14 +0200
"because there has been no further comment" doesn't seem to me a good 
reason to close a bug.

> I just now re-read 
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458 and it's not 
clear to me what bug is being reported
> downstream.

Just read the description in the original report, it should be pretty 
clear what the problem is.
Also see comment 7:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458/comments/7
Then I don't know what existing upstream bug tracks the issue, if 
there's any.


There's no test case because I've never been able to create one, but I 
keep observing the issue as described every day. (and it appeared as a 
regression in Ubuntu around the time when ubuntu bug 1535458 was reported)


> There are several messages saying it's fixed or it's not,

The issue described in the original report has never been fixed in 
Ubuntu. Some people said it was fixed upstream, and later people 
referring to the same upstream bug said it regressed again upstream. 
However, I don't know whether they are right and whether they are 
linking the ubuntu bug to the right upstream bug. Also, different people 
have different opinions about what upstream bug is the real cause of the 
downstream one.

Some of the latest comments, I seem to remember, seemed to imply that 
the new behavior would be expected, which is just idiotic.

This might be a partial test case:
Download and gunzip the following file:
http://matteosistisette.com/test/bug_grep/profile.php.gz

then try:
# grep function profile.php

Expected output: -----------
require_once(DIR . '/includes/functions_user.php');
	require_once(DIR . '/includes/functions_databuild.php');
	require_once(DIR . '/includes/functions_databuild.php');
(....)
------------  /Expected output

Actual output:
Binary file profile.php matches


This is an ISO-8859-15-encoded file (or that's what Gedit says) and I'm 
testing on a utf-8 locale, but that should be irrelevant


Output of
$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_ES.UTF-8
LC_TIME=es_ES.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=es_ES.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=es_ES.UTF-8
LC_NAME=es_ES.UTF-8
LC_ADDRESS=es_ES.UTF-8
LC_TELEPHONE=es_ES.UTF-8
LC_MEASUREMENT=es_ES.UTF-8
LC_IDENTIFICATION=es_ES.UTF-8
LC_ALL=




On 08/09/16 07:51, GNU bug Tracking System wrote:
> Your bug report
>
> #23844: broken: randomly detexts textf files as binary
>
> which was filed against the grep package, has been closed.
>
> The explanation is attached below, along with your original report.
> If you require more details, please reply to 23844 <at> debbugs.gnu.org.
>





Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Thu, 08 Sep 2016 20:56:01 GMT) Full text and rfc822 format available.

Message #22 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Matteo Sisti Sette <matteosistisette <at> gmail.com>, 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: closed (Re: bug#23844: broken: randomly detexts textf
 files as binary)
Date: Thu, 8 Sep 2016 13:54:57 -0700
On 09/08/2016 01:17 PM, Matteo Sisti Sette wrote:

> > I just now re-read 
> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458 and it's 
> not clear to me what bug is being reported
> > downstream.
>
> Just read the description in the original report, it should be pretty 
> clear what the problem is.

Sorry, it is not clear to me.

>
> Also see comment 7:
> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458/comments/7
> Then I don't know what existing upstream bug tracks the issue, if 
> there's any.

Comment 7 does not provide a reproducible test case, or hint at how to 
provide one.

> This might be a partial test case:
> Download and gunzip the following file:
> http://matteosistisette.com/test/bug_grep/profile.php.gz

Sorry, that URL does not work for me:

$ wget http://matteosistisette.com/test/bug_grep/profile.php.gz
--2016-09-08 13:53:03-- 
http://matteosistisette.com/test/bug_grep/profile.php.gz
Resolving matteosistisette.com (matteosistisette.com)... 178.62.237.38
Connecting to matteosistisette.com 
(matteosistisette.com)|178.62.237.38|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-09-08 13:53:03 ERROR 404: Not Found.





Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Thu, 08 Sep 2016 20:59:02 GMT) Full text and rfc822 format available.

Message #25 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Matteo Sisti Sette <matteosistisette <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: closed (Re: bug#23844: broken: randomly detexts textf
 files as binary)
Date: Thu, 8 Sep 2016 22:58:05 +0200
Sorry, the right url is:
http://matteosistisette.com/test/grep_bug/profile.php.gz

Whether or not the original description in the ubuntu bug was clear 
enough, this test case should be, right?



On 08/09/16 22:54, Paul Eggert wrote:
> On 09/08/2016 01:17 PM, Matteo Sisti Sette wrote:
>
>> > I just now re-read
>> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458 and it's
>> not clear to me what bug is being reported
>> > downstream.
>>
>> Just read the description in the original report, it should be pretty
>> clear what the problem is.
>
> Sorry, it is not clear to me.
>
>>
>> Also see comment 7:
>> https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1535458/comments/7
>> Then I don't know what existing upstream bug tracks the issue, if
>> there's any.
>
> Comment 7 does not provide a reproducible test case, or hint at how to
> provide one.
>
>> This might be a partial test case:
>> Download and gunzip the following file:
>> http://matteosistisette.com/test/bug_grep/profile.php.gz
>
> Sorry, that URL does not work for me:
>
> $ wget http://matteosistisette.com/test/bug_grep/profile.php.gz
> --2016-09-08 13:53:03--
> http://matteosistisette.com/test/bug_grep/profile.php.gz
> Resolving matteosistisette.com (matteosistisette.com)... 178.62.237.38
> Connecting to matteosistisette.com
> (matteosistisette.com)|178.62.237.38|:80... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2016-09-08 13:53:03 ERROR 404: Not Found.
>
>





Information forwarded to bug-grep <at> gnu.org:
bug#23844; Package grep. (Thu, 08 Sep 2016 21:09:01 GMT) Full text and rfc822 format available.

Message #28 received at 23844 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Matteo Sisti Sette <matteosistisette <at> gmail.com>, 23844 <at> debbugs.gnu.org
Subject: Re: bug#23844: broken: randomly detexts textf files as binary
Date: Thu, 8 Sep 2016 14:08:19 -0700
[Message part 1 (text/plain, inline)]
On 09/08/2016 01:58 PM, Matteo Sisti Sette wrote:
> Sorry, the right url is:
> http://matteosistisette.com/test/grep_bug/profile.php.gz
>
> Whether or not the original description in the ubuntu bug was clear 
> enough, this test case should be, right?
>
Yes, it's a clear test case. And it works for me with grep 2.25 (see 
attached transcript). So it appears that your particular bug (whatever 
it was) is fixed.

If you want 'grep' to output arbitrary binary data, you need to use 
grep's -a option. That has been true for many years (though the details 
about what constitutes "binary" have changed). I.e., the -a option is 
the only way to avoid that "Binary file matches" message in all cases.

[transcript.txt (text/plain, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 07 Oct 2016 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 228 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.