GNU bug report logs - #16421
Speed-up for case-insensitive matching in multibyte locales

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 12 Jan 2014 07:18:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16421 in the body.
You can then email your comments to 16421 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 12 Jan 2014 07:18:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 12 Jan 2014 07:18:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: Speed-up for case-insensitive matching in multibyte locales
Date: Sun, 12 Jan 2014 16:09:20 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

Case-insensitive matching is expensive in multi-byte locales because of
conversion of targeted text to lower case. 

However, I seem that awk which uses dfa.c as well as grep don't covert
target text to lower case. I seem that if grep don't use kwset, it
doesn't also have to convert.

If this patch is applied, when no parenthesis and/or backslash is
included in keywords (*), use of kwset and convesion of target is
avoided for case-insensitive matching in multi-byte locales, and
process for ignore-case is accomplished in dfaexec and regex.

(*) When parenthesis and/or backslash is included in keywords,
    it's converted to case-sensitive matching. (bug#16232)
[grep-ignore-icase.txt (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 12 Jan 2014 11:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sun, 12 Jan 2014 20:17:48 +0900
[Message part 1 (text/plain, inline)]
I'm sorry the content of the attachment is incorrect.
I send the correct file.
[grep-ignore-icase.txt (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Mon, 13 Jan 2014 16:45:02 GMT) Full text and rfc822 format available.

Message #11 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Mon, 13 Jan 2014 08:43:58 -0800
Thank you for the patch.  I will take a look in the next day or two.




Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Tue, 21 Jan 2014 21:51:02 GMT) Full text and rfc822 format available.

Message #14 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org,
 16421 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 21 Jan 2014 13:50:26 -0800
On 01/21/2014 08:50 AM, Jim Meyering wrote:
> I was expecting
> to apply it, along with another small change and a test, but now, feel
> like I'll have to justify it with some performance data as well.

Ouch, I wasn't intending to make work for you!  Even if the patch in 
<http://bugs.gnu.org/16481#14> didn't improve performance, it makes grep 
simpler and that should be a win.  Norihiro Tanaka's patch (which I'd 
forgotten about, but which is presumably better) also simplifies grep, 
so you shouldn't need to do a performance analysis to verify that it's a 
good idea.




Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 26 Jan 2014 01:46:03 GMT) Full text and rfc822 format available.

Message #17 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sat, 25 Jan 2014 17:44:46 -0800
[Message part 1 (text/plain, inline)]
On Mon, Jan 13, 2014 at 8:43 AM, Jim Meyering <jim <at> meyering.net> wrote:
> Thank you for the patch.  I will take a look in the next day or two.

Sorry about the delay.
I have divided your patch into two separate commits: one modifies
dfa.c and the other modifies dfasearch.c.  I've included 5 commits
below.  The first two are those.  The next one is "also remove call to
mb_case_map_apply", which I will merge into your dfasearch.c commit.
I left it separate solely to ease review.  The fourth one adds a
little more test coverage, and will also be merged into your
dfasearch.c commit.  The 5th one is incidental, and just happened to
be on this branch.  Please inspect the modified comments and commit
log messages on your two commits and let me know if you would like to
make any changes before I push.

Thanks again,
Jim
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 26 Jan 2014 04:46:02 GMT) Full text and rfc822 format available.

Message #20 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sun, 26 Jan 2014 13:45:13 +0900
[Message part 1 (text/plain, inline)]
Hi Jim,

I thank you for your review for the patch.

I have any requests of any changes for the modified comments and commit
log.

However, can you merge an additional patch, which is attached on this
mail, into the commit?  No longer `kwsincr_case' is called with
case-insensitive matching in a multi-byte locale by two commits.  So it
can be removed.

Norihiro
[remove_kwsincr_case.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 26 Jan 2014 04:59:02 GMT) Full text and rfc822 format available.

Message #23 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sat, 25 Jan 2014 20:57:54 -0800
On Sat, Jan 25, 2014 at 8:45 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Hi Jim,
>
> I thank you for your review for the patch.
>
> I have any requests of any changes for the modified comments and commit
> log.
>
> However, can you merge an additional patch, which is attached on this
> mail, into the commit?  No longer `kwsincr_case' is called with
> case-insensitive matching in a multi-byte locale by two commits.  So it
> can be removed.

Good catch.  I have applied most of that patch and will merge it.
However, I had to omit the part that removed the declaration of
kwset_exact_matches, since it is still used.




Information forwarded to bug-grep <at> gnu.org:
bug#16421; Package grep. (Sun, 26 Jan 2014 07:20:02 GMT) Full text and rfc822 format available.

Message #26 received at 16421 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro TANAKA <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 16421 <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sun, 26 Jan 2014 16:19:19 +0900
Sorry, you are right. the declaration of kwset_exact_matches shouldn't
be removed.





Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Sun, 26 Jan 2014 16:56:01 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sun, 26 Jan 2014 16:56:02 GMT) Full text and rfc822 format available.

Message #31 received at 16421-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro TANAKA <noritnk <at> kcn.ne.jp>
Cc: 16421-done <at> debbugs.gnu.org
Subject: Re: bug#16421: Speed-up for case-insensitive matching in multibyte
 locales
Date: Sun, 26 Jan 2014 08:55:01 -0800
I've pushed those three commits, with a small change to the second one
(removing the leading ^ and trailing '\$' in a regexp) to make that
test succeed also with -F.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 24 Feb 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 72 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.