GNU bug report logs - #16911
[PATCH] grep: fix bugs with -i and titlecase

Previous Next

Package: grep;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Sat, 1 Mar 2014 06:54:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16911 in the body.
You can then email your comments to 16911 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sat, 01 Mar 2014 06:54:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 01 Mar 2014 06:54:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: grep mailing list <bug-grep <at> gnu.org>
Cc: Aharon Robbins <arnold <at> skeeve.com>
Subject: [PATCH] grep: fix bugs with -i and titlecase
Date: Fri, 28 Feb 2014 22:53:08 -0800
[Message part 1 (text/plain, inline)]
Tags: patch

The attached patch, which I've pushed, fixes a problem with grep -i and 
titlecase that's been bugging me ever since someone pointed out some 
titlecase issues on the grep mailing list a few weeks ago.  It affects 
dfa.c, so I expect it'll fix a similar problem with gawk.
[0001-grep-fix-bugs-with-i-and-titlecase.patch (text/plain, attachment)]

bug closed, send any further explanations to 16911 <at> debbugs.gnu.org and Paul Eggert <eggert <at> cs.ucla.edu> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sat, 01 Mar 2014 06:56:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sat, 01 Mar 2014 13:32:02 GMT) Full text and rfc822 format available.

Message #10 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 16911 <at> debbugs.gnu.org
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Sat, 01 Mar 2014 06:31:05 -0700
[Message part 1 (text/plain, inline)]
On 02/28/2014 11:53 PM, Paul Eggert wrote:
> Tags: patch
> 
> The attached patch, which I've pushed, fixes a problem with grep -i and
> titlecase that's been bugging me ever since someone pointed out some
> titlecase issues on the grep mailing list a few weeks ago.  It affects
> dfa.c, so I expect it'll fix a similar problem with gawk.
> 

>  
> +  grep -i no longer mishandles patterns containing titlecase characters.
> +  For example, in a locale containing the titlecase character
> +  'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
> +  'grep -i Lj' now matches 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ).

Does it also match the lower case version?  In other words, are all
three cases of this character treated as equivalent?  It might help to
mention all three characters in the NEWS blurb.


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sat, 01 Mar 2014 23:08:02 GMT) Full text and rfc822 format available.

Message #13 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 16911 <at> debbugs.gnu.org
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Sat, 01 Mar 2014 15:07:25 -0800
[Message part 1 (text/plain, inline)]
Eric Blake wrote:
> It might help to mention all three characters in the NEWS blurb.

Thanks, I pushed the attached patch.
[0001-doc-describe-titlecase-fix-better.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sun, 02 Mar 2014 00:50:02 GMT) Full text and rfc822 format available.

Message #16 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eric Blake <eblake <at> redhat.com>, 16911 <at> debbugs.gnu.org
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Sat, 1 Mar 2014 16:49:06 -0800
Thanks for those patches.
I'm seeing that new test fail on OS/X 10.8.5 and don't have
time to pursue it right away, so in case someone else does, ...

[using the same "in" file created by the test]
$ src/grep -Ei '(Lj)\1' in
LjLj
$ src/grep -Ei '(Lj)' in
ljlj
LjLj
LJLJ
$ src/grep -Ei '(Lj)\1' in
LjLj
$ src/grep -Ei '(lj)\1' in
ljlj
LJLJ
$ src/grep -Ei '(LJ)\1' in
ljlj
LJLJ

Here's the relevant part of the test-suite.log file:

+ LC_ALL=en_US.UTF-8
+ export LC_ALL
+ fail=0
+ LJ='\307\207'
+ Lj='\307\210'
+ lj='\307\211'
++ printf '\307\210\n'
+ pattern=$'<C7>\210'
+ printf '\307\211\307\211\n\307\210\307\210\n\307\207\307\207\n'
+ grep -i $'<C7>\210' in
+ compare in out
+ compare_dev_null_ in out
+ test 2 = 2
+ test xin = x/dev/null
+ test xout = x/dev/null
+ return 2
+ case $? in
+ compare_ in out
+ diff -u in out
+ pattern='(Lj)\1'
+ grep -Ei '(Lj)\1' in
+ compare in out
+ compare_dev_null_ in out
+ test 2 = 2
+ test xin = x/dev/null
+ test xout = x/dev/null
+ return 2
+ case $? in
+ compare_ in out
+ diff -u in out
--- in  2014-03-01 16:22:38.000000000 -0800
+++ out 2014-03-01 16:22:38.000000000 -0800
@@ -1,3 +1 @@
-ljlj
 LjLj
-LJLJ
+ fail=1




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 03 Mar 2014 07:11:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Mon, 03 Mar 2014 07:28:01 GMT) Full text and rfc822 format available.

Message #21 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: Eric Blake <eblake <at> redhat.com>, 16911 <at> debbugs.gnu.org
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Sun, 02 Mar 2014 23:27:25 -0800
[Message part 1 (text/plain, inline)]
[I've reopened 16911 since the bug's not fixed on OS X.]

Here's my guess.  In glibc's en_US locale, 'Lj' is considered to be both 
uppercase and lowercase; but in OS X's en_US locale, it's considered to 
be neither uppercase nor lowercase.  If so, the attached gnulib patch 
should fix the problem (though I can't easily test this).  Could you 
please give it a try?

By the way, I'd like to remove the need for grep's local differences 
from the glibc regex code.  I assume it's there only to pacify GCC's 
warnings flags, and we can do that with pragmas in gnulib.  One fix at a 
time, though.

[regex-osx.diff (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Tue, 04 Mar 2014 03:19:02 GMT) Full text and rfc822 format available.

Message #24 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eric Blake <eblake <at> redhat.com>, 16911 <16911 <at> debbugs.gnu.org>
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Mon, 3 Mar 2014 19:18:20 -0800
On Sun, Mar 2, 2014 at 11:27 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> [I've reopened 16911 since the bug's not fixed on OS X.]
>
> Here's my guess.  In glibc's en_US locale, 'Lj' is considered to be both
> uppercase and lowercase; but in OS X's en_US locale, it's considered to be
> neither uppercase nor lowercase.  If so, the attached gnulib patch should
> fix the problem (though I can't easily test this).  Could you please give it
> a try?

Hi Paul,

That patch does indeed solve the problem.

> By the way, I'd like to remove the need for grep's local differences from
> the glibc regex code.  I assume it's there only to pacify GCC's warnings
> flags, and we can do that with pragmas in gnulib.  One fix at a time,
> though.

You're right.  It was only to avoid warnings from gcc, and using #pragmas
is a better approach, in a project like grep where we rarely modify that code.

Thanks!
Jim




Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Wed, 05 Mar 2014 19:38:02 GMT) Full text and rfc822 format available.

Message #27 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 16911 <16911 <at> debbugs.gnu.org>
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Wed, 05 Mar 2014 11:37:08 -0800
[Message part 1 (text/plain, inline)]
On 03/03/2014 07:18 PM, Jim Meyering wrote:
> You're right.  It was only to avoid warnings from gcc, and using #pragmas
> is a better approach, in a project like grep where we rarely modify that code.

I just now checked, and without the grep diffs there are no warnings 
when I configure with grep's 'configure --enable-gcc-warnings' on Fedora 
20 (gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7)).  Possibly GCC got 
smarter, or possibly the pragmas in gnulib regex now suffice.  So I've 
removed the grep diffs with the attached patch for now; if warnings come 
back (older compilers maybe?) we can add more pragmas to the gnulib copy.
[0001-maint-remove-differences-from-gnulib-regex-code.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Thu, 06 Mar 2014 21:21:02 GMT) Full text and rfc822 format available.

Message #30 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 16911 <at> debbugs.gnu.org
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Thu, 06 Mar 2014 13:20:05 -0800
[Message part 1 (text/plain, inline)]
On 03/01/2014 03:07 PM, Paul Eggert wrote:
> Eric Blake wrote:
>> It might help to mention all three characters in the NEWS blurb.
>
> Thanks, I pushed the attached patch.

I see now that my documentation fix went too far, as it promised 
behavior that the regex code does not in fact implement.  The plan is to 
fix the DFA code to match what the regex code does, and the first step 
is to remove the promises that aren't being kept now (when the regex 
code is used).  I pushed the attach documentation patch.

[0001-doc-do-not-overpromise-ignore-case-s-behavior.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Fri, 07 Mar 2014 05:58:02 GMT) Full text and rfc822 format available.

Message #33 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, Eric Blake <eblake <at> redhat.com>,
 16911 <16911 <at> debbugs.gnu.org>
Subject: Re: bug#16911: [PATCH] grep: fix bugs with -i and titlecase
Date: Thu, 06 Mar 2014 21:57:34 -0800
[Message part 1 (text/plain, inline)]
Jim Meyering wrote:
> That patch does indeed solve the problem.

OK, thanks.  I think only part of the patch is actually needed and I see 
potential problems with the other part, so I installed the former into 
gnulib (see attached) and will leave the latter for later.
[0001-regex-port-to-OS-X-10.8.5-en_US.UTF-8-locale.patch (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sat, 08 Mar 2014 02:43:01 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Sat, 08 Mar 2014 02:43:02 GMT) Full text and rfc822 format available.

Message #38 received at 16911-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 16911-done <at> debbugs.gnu.org
Subject: Re: grep: fix bugs with -i and titlecase
Date: Fri, 07 Mar 2014 18:42:53 -0800
I think this bug should be fixed on OS X now, so I'm marking it as done. 
 We can reopen it later if I'm wrong.




Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sat, 08 Mar 2014 03:12:01 GMT) Full text and rfc822 format available.

Message #41 received at 16911 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 16911 <16911 <at> debbugs.gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 16911-done <at> debbugs.gnu.org
Subject: Re: bug#16911: grep: fix bugs with -i and titlecase
Date: Fri, 7 Mar 2014 19:11:32 -0800
On Fri, Mar 7, 2014 at 6:42 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> I think this bug should be fixed on OS X now, so I'm marking it as done.  We
> can reopen it later if I'm wrong.

Confirmed: it's still fixed. Thanks again.




Information forwarded to bug-grep <at> gnu.org:
bug#16911; Package grep. (Sat, 08 Mar 2014 03:12:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 05 Apr 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 38 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.