GNU bug report logs - #17229
[PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Wed, 9 Apr 2014 13:56:02 UTC

Severity: normal

Tags: moreinfo, patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17229 in the body.
You can then email your comments to 17229 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Wed, 09 Apr 2014 13:56:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 09 Apr 2014 13:56:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching
Date: Wed, 09 Apr 2014 22:54:43 +0900

[Message part 1 (text/plain, inline)]

memchr() of glibc is faster than seeking by delta1 on some platforms.
So, when there is no chance to match for a while, use it on them.

Speed-up about 3x in best case, 10% in normal case by this patch.

It's still available only for x86 and x86-64 platform.

Norihiro

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 10 Apr 2014 00:36:02 GMT) Full text and rfc822 format available.

Message #8 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Wed, 09 Apr 2014 17:35:31 -0700

[Message part 1 (text/plain, inline)]

On 04/09/2014 06:54 AM, Norihiro Tanaka wrote:
> memchr() of glibc is faster than seeking by delta1 on some platforms.
> So, when there is no chance to match for a while, use it on them.
>
> Speed-up about 3x in best case, 10% in normal case by this patch.
>
> It's still available only for x86 and x86-64 platform.
>

Thanks, I have some questions about this one.

What benchmark did you use to time this?

THe patch refers to a variable called 'trans' that is not in the current 
savannah git master.  Is there some other patch that establishes this 
variable, a patch that is a prerequisite for this one?  If trans is a 
cached copy of kwset->trans, isn't that guaranteed to be NULL here?

What is delta1?  It's mentioned in a comment but not in the code.

It'd be simpler to use memchr on all platforms; is there a major 
performance downside to that?  For example, what about the attached 
patch instead?  It also simplifies things a bit to avoid a label and its 
associated gotos, under the assumption that trans is NULL here.

[memchr.diff (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 10 Apr 2014 11:38:02 GMT) Full text and rfc822 format available.

Message #11 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 10 Apr 2014 20:37:45 +0900

Hi Paul,
Hi Paul,

Thanks for the review for the patch.

> What benchmark did you use to time this?

I measured below.

$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k

Before:
$ time -p env LANG=C src/grep jk k
real 1.53
user 1.21
sys 0.31

After:
$ time -p env LANG=C src/grep jk k
real 0.33
user 0.03
sys 0.29

> Is there some other patch that establishes this variable, a patch that
> is a prerequisite for this one?

The patch requires below.

bug#17230
[PATCH 1/2] grep: may also use Boyer-Moore algorithm for case-insensitive matching

> What is delta1?  It's mentioned in a comment but not in the code.

It means `d1' in kwset.c:bmexec.

> It'd be simpler to use memchr on all platforms;
> is there a major performance downside to that?

Yes.  As far as I was confirmed, it's slow in HP-UX on Itanium and
Solaris on SPARC.  I think that that it depends on the implementation of
memchr() and the rate of the increment instruction for the `add'
instruction on the platform.

Norihiro

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 10 Apr 2014 21:12:01 GMT) Full text and rfc822 format available.

Message #14 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 10 Apr 2014 14:11:18 -0700

On 04/10/2014 04:37 AM, Norihiro Tanaka wrote:
> The patch requires below. bug#17230 [PATCH 1/2] grep: may also use 
> Boyer-Moore algorithm for case-insensitive matching

OK, thanks, I'll work on Bug #17230 first, before worrying about 17229.

> It'd be simpler to use memchr on all platforms;
> is there a major performance downside to that?
> Yes.  As far as I was confirmed, it's slow in HP-UX on Itanium and
> Solaris on SPARC.  I think that that it depends on the implementation of
> memchr() and the rate of the increment instruction for the `add'
> instruction on the platform.

If it's *way* slower with memchr and HP-UX/Itanium or Solaris/SPARC we 
may need to complicate the code, but if it's just a bit slower then it's 
probably better to use the simpler implementation. There's a tradeoff 
between simplified maintenance and performance on non-GNU hosts; 
sometimes the former is more important than the latter, and this may be 
one of those times.

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 10 Apr 2014 21:34:01 GMT) Full text and rfc822 format available.

Message #17 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 10 Apr 2014 15:33:10 -0600

[Message part 1 (text/plain, inline)]

On 04/10/2014 03:11 PM, Paul Eggert wrote:
> On 04/10/2014 04:37 AM, Norihiro Tanaka wrote:
>> The patch requires below. bug#17230 [PATCH 1/2] grep: may also use
>> Boyer-Moore algorithm for case-insensitive matching
> 
> OK, thanks, I'll work on Bug #17230 first, before worrying about 17229.
> 
>> It'd be simpler to use memchr on all platforms;
>> is there a major performance downside to that?
>> Yes.  As far as I was confirmed, it's slow in HP-UX on Itanium and
>> Solaris on SPARC.  I think that that it depends on the implementation of
>> memchr() and the rate of the increment instruction for the `add'
>> instruction on the platform.
> 
> If it's *way* slower with memchr and HP-UX/Itanium or Solaris/SPARC we
> may need to complicate the code, but if it's just a bit slower then it's
> probably better to use the simpler implementation. There's a tradeoff
> between simplified maintenance and performance on non-GNU hosts;
> sometimes the former is more important than the latter, and this may be
> one of those times.

In the past, we've used gnulib to work around painfully slow
implementations.  Remember that many platforms have a quadratic
strstr(); gnulib has the ability to replace it with a guaranteed-linear
version written in straight C code (side note - glibc has since taken
gnulib's implementation and further optimized it; maybe we should
consider feeding the glibc improvements back to gnulib).  If the native
memchr can be demonstrated to be slower on certain platforms in
comparison to an implementation that we can write in straight C, then
having gnulib supply the faster replacement is so much easier to
maintain than to special-case non-GNU hosts in the main body of grep code.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 10 Apr 2014 23:26:01 GMT) Full text and rfc822 format available.

Message #20 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Fri, 11 Apr 2014 08:25:25 +0900

[Message part 1 (text/plain, inline)]

I believe that memchr() isn't slower on HP-UX and Solaris but more
faster on x86 processors.

Now I wrote the additional patch.

It replaces `tp += d' to the other code in bmexec() and cwexec() for x86
processors.

I see that the code is more complex and slower theoretically, but it's
faster actually when `d' is small.  (We can test it by `grep -i jk k'
after patch#17230 and this patch.)

Therefore, I think that memchr() is faster than `tp += d' because the
increment instruction is more faster than the add instruction on x86
processors.  KWSet may be slower than DFA on them in the worst case,
even if doesn't use delta2.  Try to run and compare below and compare
without this patches.  The former uses only KWSet and the later uses only
DFA.  Later is faster on Linux x86, because DFA uses increment instrument
for shift of text pointer.  It may generate a wrong usage among users.
I think that it should be avoided.

  $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
  $ env LANG=C time -p src/grep jk k
  $ env LANG=C time -p src/grep -i jk k

> sometimes the former is more important than the latter, and this may be
> one of those times.

I see that grep attaches a high value to performance.  It uses not only
regex but also DFA.  Further more, DFA has the specific codes for utf-8
locale, which is used most frequently.  I think that grep may also have
specific codes for x86 processors, which are also used most frequently.

Norihiro

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Fri, 11 Apr 2014 00:00:02 GMT) Full text and rfc822 format available.

Message #23 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17229 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Fri, 11 Apr 2014 08:59:41 +0900

Paul Eggert wrote:
> sometimes the former is more important than the latter, and this may be one of those times.

I also like simple, and I don't like so much platform specific optimization.

However, I confirmed 10% speed-up with wikipedia database and the simple
word `Wikipedia'.  I think that 10% speed-up cannot ignorable on most
frequently used platform and in most simple and frequently used usage.

$ env LANG=C time -p src/grep Wikipedia pages-articles.xml

http://dumps.wikimedia.org/jawiki/latest/

Norihiro

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Wed, 23 Apr 2014 07:10:03 GMT) Full text and rfc822 format available.

Message #26 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Wed, 23 Apr 2014 00:08:52 -0700

[Message part 1 (text/plain, inline)]

Norihiro Tanaka wrote:
> Now I wrote the additional patch.

Sorry, I don't understand that patch.  I applied your earlier patch 
(attachment 1) and then merged that patch (attachment 2), but the 
resulting 'grep' failed 'make check'; I got a "FAIL: fedora" and the 
test after "file" never finished.  This was on Fedora 20 x86-64 with my 
own installation of GCC 4.9.0.

Instead of trying to understand that patch, I wrote a different patch 
(attachment 3) that simplifies the code and removes the part that is 
x86-specific.  This improves the performance quite a bit for the test 
case given in the ChangeLog entry.  I also tested performance on Sparc 
Solaris 10, and memchr was a big win there.  Perhaps there are platforms 
where memchr is a big loss, but we can cross that bridge if we come to it.

Perhaps I merged that patch incorrectly, so I'll leave this bug open for 
now.  To be honest I hope that patch doesn't improve performance 
significantly if we get it to work correctly, as it looks tricky.

[0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch (text/plain, attachment)]

[0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch (text/plain, attachment)]

[0002-kwset-simplify-and-speed-up-Boyer-Moore-unibyte-i-in.patch (text/plain, attachment)]

Added tag(s) moreinfo. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 23 Apr 2014 07:13:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Wed, 23 Apr 2014 17:52:01 GMT) Full text and rfc822 format available.

Message #31 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 24 Apr 2014 02:51:43 +0900

Paul Eggert wrote:
> This improves the performance quite a bit for the test case given in
> the ChangeLog entry.  I also tested performance on Sparc Solaris 10,
> and memchr was a big win there.

You are right.  I also ran the tests on Solaris 10 and HP-UX 11iv2.
memchr() was win on both machines.  Especially on HP-UX 11iv2, memchr was
10x faster than delta1 searching.

By the way, could you also test below for master and original grep-2.18?

$ yes abcdabc | head -50000000 >../k
$ env LANG=C time -p src/grep abcd.bd ../k

Perhaps, later will be faster.

0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch will fix
it.  delta2 searching is higher cost than mind2 searching in original
grep-2.18.  We need to reduce it for delta2 searching.

Norihiro

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 24 Apr 2014 06:34:02 GMT) Full text and rfc822 format available.

Message #34 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Wed, 23 Apr 2014 23:33:13 -0700

Norihiro Tanaka wrote:
> could you also test below for master and original grep-2.18?
>
> $ yes abcdabc | head -50000000 >../k
> $ env LANG=C time -p src/grep abcd.bd ../k
>
> Perhaps, later will be faster.

Yes, grep 2.18 is a bit faster on that benchmark for me; it's about 3.6s 
real-time, whereas the master is about 4.2s.

> 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch will fix
> it.  delta2 searching is higher cost than mind2 searching in original
> grep-2.18.  We need to reduce it for delta2 searching.

Unfortunately it doesn't help for me; it causes the same benchmark to 
take about 4.3s real-time.  Here, I am talking about the version 
resulting from applying the patch in 
<http://debbugs.gnu.org/cgi/bugreport.cgi?msg=22;filename=17230.diff;att=1;bug=17230> 
to the master.

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Thu, 24 Apr 2014 12:31:02 GMT) Full text and rfc822 format available.

Message #37 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 24 Apr 2014 21:30:04 +0900

Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Unfortunately it doesn't help for me; it causes the same benchmark to
> take about 4.3s real-time.  Here, I am talking about the version
> resulting from applying the patch in
> <http://debbugs.gnu.org/cgi/bugreport.cgi?msg=22;filename=17230.diff;att=1;bug=17230> to the master.

Sorry, the improvement for `abcd.bd' is
not 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
but 0001-kwset-simplify-Boyer-Moore-with-unibyte-i.patch.
Therefore, Ignore this case.

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Fri, 25 Apr 2014 16:15:02 GMT) Full text and rfc822 format available.

Message #40 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sat, 26 Apr 2014 01:14:22 +0900

grep 2.18 is slow for below, because always d == 1.

  $ env LANG=C src/grep jk k

Therefore, I wrote 0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch.

When `d' is small, speeds up.  memchr() is faster than or as fast as
delta1 search even when `d' is sufficiently large.

However, this patch can't apply to case-insensitive matching.
In fast, when `d' is large as following case, memchr_trans() imitated
memchr() will occur slowdown.

  $ env LANG=C src/grep -i kkkkkkkkkkkkkkkkkkkkl k

So my patch works for only case-sensitive matching.  By the way, for below
it's still slow with my patch.  Especially, on x86 machines, it's slower
than grep 2.18.

  $ env LANG=C src/grep -i jk k

I think that the master should be faster than original version (grep 2.18),
which uses not kwset but DFA for it.

Accordingly, I added 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
I noticed that DFA uses increments to proceed text pointer, and found
that the repeat of `++tp;' is faster than tp += d on x86 machines, when
`d' is small.  `++tp; ++tp; ++tp' is faster than `tp += d;' on it.
This patch inprements it.  I looked at the performance with below.

  $ env LANG=C src/grep -i jk k
  $ env LANG=C src/grep -i jkk k
  $ env LANG=C src/grep -i jkkk k
  $ env LANG=C src/grep -i jkkkk k
  $ env LANG=C src/grep -i jkkkkk k
  $ env LANG=C src/grep -i jkkkkkk k
  $ env LANG=C src/grep -i jkkkkkkk k
  $ env LANG=C src/grep -i jkkkkkkkk k
  $ env LANG=C src/grep -i jkkkkkkkkk k

When `d' is greater than 8, It uses tp += d so that it's as fast as
repeat of `++tp;'.  OTOH, when `d' is less than or equal to 8, uses the
repeat of `++tp', so that it's as fast as or faster than `tp += d'.

I used some macros in my patches, because I don't know how to replace it
to other expression without slowdown.

Norihiro

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Fri, 25 Apr 2014 16:23:02 GMT) Full text and rfc822 format available.

Message #43 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Fri, 25 Apr 2014 10:22:42 -0600

[Message part 1 (text/plain, inline)]

On 04/25/2014 10:14 AM, Norihiro Tanaka wrote:
> grep 2.18 is slow for below, because always d == 1.
> 
>   $ env LANG=C src/grep jk k
> 
> Therefore, I wrote 0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch.
> 
> When `d' is small, speeds up.  memchr() is faster than or as fast as
> delta1 search even when `d' is sufficiently large.
> 
> However, this patch can't apply to case-insensitive matching.
> In fast, when `d' is large as following case, memchr_trans() imitated
> memchr() will occur slowdown.

Gnulib includes a memchr2() interface, which efficiently searches for
one of two byte values across a known memory length.  It is not quite as
fast as optimized assembly memchr for a single byte search, but when
searching for two bytes in parallel, it is hands down faster than two
sequential memchr() operations or any naive byte-by-byte comparisons.  I
suspect that using memchr2() for case-insensitive searches may allow you
a speedup when searching for (the first byte of) two potential matches
in the search string to the first character of a case-insensitive pattern.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Sat, 26 Apr 2014 00:27:01 GMT) Full text and rfc822 format available.

Message #46 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Eric Blake <eblake <at> redhat.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sat, 26 Apr 2014 09:26:34 +0900

Eric Blake wrote:
> It is not quite as fast as optimized assembly memchr for a single byte
> search, but when searching for two bytes in parallel, it is hands down
> faster than two sequential memchr() operations or any naive byte-by-byte
> comparisons.

Thanks, I look at the performance and compare them.  memchr2() is the
fastest for case-insensitive except for large d Solaris or HP-UX.

However, kwset doesn't know that `trans' doesn't have counterparts more
than one for any character.  So we need to check it before run memchr2().

<< Linux 86-64 >>

  memchr2         : real 0.55   user 0.07   sys 0.48
  memchr_trans    : real 0.71   user 0.14   sys 0.57
  my patch small d: real 1.06   user 0.57   sys 0.48
  my patch large d: real 0.54   user 0.01   sys 0.52
  DFA             : real 1.45   user 0.91   sys 0.53

<< Solaris 10 >>

  memchr2         : real 3.17   user 2.45   sys 0.71
  memchr_trans    : real 4.16   user 3.45   sys 0.70
  my patch small d: real 6.43   user 5.71   sys 0.71
  my patch large d: real 1.29   user 0.57   sys 0.71
  DFA             : real 11.08  user 10.35  sys 0.71

<< HP-UX ia64 >>

  memchr2         : real 0.9    user 0.6    sys 0.2
  memchr_trans    : real 2.5    user 2.3    sys 0.2
  my patch small d: real 2.1    user 1.9    sys 0.2
  my patch large d: real 0.3    user 0.1    sys 0.2
  DFA             : real 3.2    user 3.0    sys 0.2

For small d:

$ env LANG=C time -p src/grep -i jk ../k

For large d:

$ env LANG=C time -p src/grep -i kkkkkkkkkkkkkkkkkkkk ../k

For DFA:

$ env LANG=C time -p src/grep -i 'k\|l' ../k


By the way, 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
is effective still, even when we use memchr2().  It's useful for below,
which doesn't reach at memchr2() to run alternately with delta1 searching
and delta2 searching.

  $ yes kjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkj | head -10000000 >k
  $ env LANG=C src/grep -i jj k

before: real 0.85     user 0.64       sys 0.21
after : real 0.54     user 0.29       sys 0.25

Norihiro

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Sat, 26 Apr 2014 08:45:02 GMT) Full text and rfc822 format available.

Message #49 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17229 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Eric Blake <eblake <at> redhat.com>
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sat, 26 Apr 2014 17:43:57 +0900

[Message part 1 (text/plain, inline)]

I submit the patch to use memchr2() for case-insensitive matching in
bmexec().  It counts last characters of the patterns in kwsprep(), if
it's just two, use memchr2().  The new version uses memchr() still, if
last character of the pattern is non-alphabetical.

Norihiro

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Sun, 27 Apr 2014 04:05:02 GMT) Full text and rfc822 format available.

Message #52 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sat, 26 Apr 2014 21:04:13 -0700

[Message part 1 (text/plain, inline)]

Thanks for the test cases and patch.  In my tests, switching to macros 
does not help performance, and that SWITCH macro's implementation 
actually slows things down a bit, which is what I'd expect.  If there is 
a reason to use macros I'd like to see a patch that simply changes 
functions to macros without changing the algorithm, so that we can 
measure this effect separately from the algorithm change.

I attempted to suss out the performance improvements in that patch 
without using macros, and installed the attached changes.  With these 
changes grep performs about as well as it does with that patch, on the 
benchmarks you mentioned that I tried (as before, I'm using the default 
optimization with GCC 4.9.0 x86-64 on an AMD Phenom II X4 910e).  Quite 
possibly I've missed something, of course.  The two "advance_*" 
constants used in the heuristics are guesses: I haven't measured 
rigorously to try to come up with good values.

[0001-kwset-improve-performance-when-large-Boyer-Moore-key.patch (text/plain, attachment)]

[0002-kwset-speed-up-by-using-memchr2.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Sun, 27 Apr 2014 07:58:01 GMT) Full text and rfc822 format available.

Message #55 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sun, 27 Apr 2014 16:57:21 +0900

[Message part 1 (text/plain, inline)]

Thanks for the improvement with memchr2().  It's very great!

Paul Eggert wrote:
> If there is a reason to use macros I'd like to see a patch that simply
> changes functions to macros without changing the algorithm, so that we
> can measure this effect separately from the algorithm change.

I tested with below.

  yes jjjjjjjjjjjjjjjjjjjj | head -10000000 >k
  env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k

The performance with the current master is as following.

    real 2.95       user 2.44       sys 0.45

After changes function to macros.

    real 1.09       user 0.59       sys 0.43

Could you try above cases?

Norihiro

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Sun, 27 Apr 2014 20:34:02 GMT) Full text and rfc822 format available.

Message #58 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sun, 27 Apr 2014 13:32:57 -0700

[Message part 1 (text/plain, inline)]

Norihiro Tanaka wrote:
> Could you try above cases?

Thanks, you're observing a 2.7x performance speedup with macros on your 
platform and your benchmark.  With the same patch, I observed only a 
1.18x speedup on the same benchmark.  As usual, I'm testing with AMD 
Phenom II X4 910e + GCC 4.9.0 + Fedora 20 + default (-O2) optimization. 
 I'm curious about why you're observing a much bigger performance 
difference with macros.  What platform are you using?

Anyway, an 18% speedup is still a speedup, so I looked into it.  GCC 
4.9.0 misses a non-obvious opportunity for function inlining.  I 
installed a tweak (attached) that should make the inlining opportunity 
obvious to compilers nowadays.  On my platform this gave a 28% speedup, 
i.e., a bit better than the macro-using patch would have.

[0001-kwset-improve-performance-by-inlining-more.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Mon, 28 Apr 2014 04:07:02 GMT) Full text and rfc822 format available.

Message #61 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Sun, 27 Apr 2014 21:05:32 -0700

On Sun, Apr 27, 2014 at 1:32 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Anyway, an 18% speedup is still a speedup, so I looked into it.  GCC 4.9.0
> misses a non-obvious opportunity for function inlining.  I installed a tweak
> (attached) that should make the inlining opportunity obvious to compilers
> nowadays.  On my platform this gave a 28% speedup, i.e., a bit better than
> the macro-using patch would have.

Very nice.  I measured a 34% improvement on an i7-3615QM with -O2 and
gcc built from git (4.10.0 20140424)

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Mon, 28 Apr 2014 12:47:01 GMT) Full text and rfc822 format available.

Message #64 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Mon, 28 Apr 2014 21:45:55 +0900

[Message part 1 (text/plain, inline)]

Paul Eggert wrote:
> Anyway, an 18% speedup is still a speedup, so I looked into it.
> GCC 4.9.0 misses a non-obvious opportunity for function inlining.  I
> installed a tweak (attached) that should make the inlining opportunity
> obvious to compilers nowadays.  On my platform this gave a 28% speedup,
> i.e., a bit better than the macro-using patch would have.

You are right.  My compiler was too old.  It was GCC 4.1.2 on CentOS 5.10.
I retried it with GCC 4.4.7, and got the good performance.

  # Although I tried to build GCC 4.9.0, it hasn't carried out well yet.

By the way, I examined the reason why it was slow on GCC 4.1.2, and I
found that tr() isn't inlining without `-finline-loops' option, because
`-finline-small-functions' option can be used from GCC 4.3.  Although I
submit the patch, it mayn't be so important.

Thanks,
Norihiro

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17229; Package grep. (Wed, 30 Apr 2014 01:39:01 GMT) Full text and rfc822 format available.

Message #67 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17229 <at> debbugs.gnu.org
Subject: Re: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Tue, 29 Apr 2014 18:38:05 -0700

> Subject: [PATCH] kwset: improve the performance by inlining tr

Thanks.  Normally we don't add 'inline' to functions, under the theory 
that compilers inline well-enough nowadays that we'd rather trust their 
judgment, but I suppose CentOS 5 is still relevant-enough and the 
performance effects great enough that we can make an exception here. 
Plus, we're already using 'inline' elsewhere in this module for 
performance reasons, and it's unlikely that a non-inlined 'tr' would 
perform better on any current platform.  So, I installed that patch.

Reply sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
You have taken responsibility. (Wed, 30 Apr 2014 23:12:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Wed, 30 Apr 2014 23:12:03 GMT) Full text and rfc822 format available.

Message #72 received at 17229-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229-done <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Thu, 01 May 2014 08:10:53 +0900

Thanks.  Closing, all requests suggested in this bug was considered.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 29 May 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 359 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #17229 [PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching

GNU bug report logs - #17229
[PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching