GNU bug report logs -
#17229
[PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Wed, 9 Apr 2014 13:56:02 UTC
Severity: normal
Tags: moreinfo, patch
Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17229 in the body.
You can then email your comments to 17229 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Wed, 09 Apr 2014 13:56:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 09 Apr 2014 13:56:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
memchr() of glibc is faster than seeking by delta1 on some platforms.
So, when there is no chance to match for a while, use it on them.
Speed-up about 3x in best case, 10% in normal case by this patch.
It's still available only for x86 and x86-64 platform.
Norihiro
[patch.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 10 Apr 2014 00:36:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/09/2014 06:54 AM, Norihiro Tanaka wrote:
> memchr() of glibc is faster than seeking by delta1 on some platforms.
> So, when there is no chance to match for a while, use it on them.
>
> Speed-up about 3x in best case, 10% in normal case by this patch.
>
> It's still available only for x86 and x86-64 platform.
>
Thanks, I have some questions about this one.
What benchmark did you use to time this?
THe patch refers to a variable called 'trans' that is not in the current
savannah git master. Is there some other patch that establishes this
variable, a patch that is a prerequisite for this one? If trans is a
cached copy of kwset->trans, isn't that guaranteed to be NULL here?
What is delta1? It's mentioned in a comment but not in the code.
It'd be simpler to use memchr on all platforms; is there a major
performance downside to that? For example, what about the attached
patch instead? It also simplifies things a bit to avoid a label and its
associated gotos, under the assumption that trans is NULL here.
[memchr.diff (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 10 Apr 2014 11:38:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Hi Paul,
Hi Paul,
Thanks for the review for the patch.
> What benchmark did you use to time this?
I measured below.
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
Before:
$ time -p env LANG=C src/grep jk k
real 1.53
user 1.21
sys 0.31
After:
$ time -p env LANG=C src/grep jk k
real 0.33
user 0.03
sys 0.29
> Is there some other patch that establishes this variable, a patch that
> is a prerequisite for this one?
The patch requires below.
bug#17230
[PATCH 1/2] grep: may also use Boyer-Moore algorithm for case-insensitive matching
> What is delta1? It's mentioned in a comment but not in the code.
It means `d1' in kwset.c:bmexec.
> It'd be simpler to use memchr on all platforms;
> is there a major performance downside to that?
Yes. As far as I was confirmed, it's slow in HP-UX on Itanium and
Solaris on SPARC. I think that that it depends on the implementation of
memchr() and the rate of the increment instruction for the `add'
instruction on the platform.
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 10 Apr 2014 21:12:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 17229 <at> debbugs.gnu.org (full text, mbox):
On 04/10/2014 04:37 AM, Norihiro Tanaka wrote:
> The patch requires below. bug#17230 [PATCH 1/2] grep: may also use
> Boyer-Moore algorithm for case-insensitive matching
OK, thanks, I'll work on Bug #17230 first, before worrying about 17229.
> It'd be simpler to use memchr on all platforms;
> is there a major performance downside to that?
> Yes. As far as I was confirmed, it's slow in HP-UX on Itanium and
> Solaris on SPARC. I think that that it depends on the implementation of
> memchr() and the rate of the increment instruction for the `add'
> instruction on the platform.
If it's *way* slower with memchr and HP-UX/Itanium or Solaris/SPARC we
may need to complicate the code, but if it's just a bit slower then it's
probably better to use the simpler implementation. There's a tradeoff
between simplified maintenance and performance on non-GNU hosts;
sometimes the former is more important than the latter, and this may be
one of those times.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 10 Apr 2014 21:34:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/10/2014 03:11 PM, Paul Eggert wrote:
> On 04/10/2014 04:37 AM, Norihiro Tanaka wrote:
>> The patch requires below. bug#17230 [PATCH 1/2] grep: may also use
>> Boyer-Moore algorithm for case-insensitive matching
>
> OK, thanks, I'll work on Bug #17230 first, before worrying about 17229.
>
>> It'd be simpler to use memchr on all platforms;
>> is there a major performance downside to that?
>> Yes. As far as I was confirmed, it's slow in HP-UX on Itanium and
>> Solaris on SPARC. I think that that it depends on the implementation of
>> memchr() and the rate of the increment instruction for the `add'
>> instruction on the platform.
>
> If it's *way* slower with memchr and HP-UX/Itanium or Solaris/SPARC we
> may need to complicate the code, but if it's just a bit slower then it's
> probably better to use the simpler implementation. There's a tradeoff
> between simplified maintenance and performance on non-GNU hosts;
> sometimes the former is more important than the latter, and this may be
> one of those times.
In the past, we've used gnulib to work around painfully slow
implementations. Remember that many platforms have a quadratic
strstr(); gnulib has the ability to replace it with a guaranteed-linear
version written in straight C code (side note - glibc has since taken
gnulib's implementation and further optimized it; maybe we should
consider feeding the glibc improvements back to gnulib). If the native
memchr can be demonstrated to be slower on certain platforms in
comparison to an implementation that we can write in straight C, then
having gnulib supply the faster replacement is so much easier to
maintain than to special-case non-GNU hosts in the main body of grep code.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 10 Apr 2014 23:26:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I believe that memchr() isn't slower on HP-UX and Solaris but more
faster on x86 processors.
Now I wrote the additional patch.
It replaces `tp += d' to the other code in bmexec() and cwexec() for x86
processors.
I see that the code is more complex and slower theoretically, but it's
faster actually when `d' is small. (We can test it by `grep -i jk k'
after patch#17230 and this patch.)
Therefore, I think that memchr() is faster than `tp += d' because the
increment instruction is more faster than the add instruction on x86
processors. KWSet may be slower than DFA on them in the worst case,
even if doesn't use delta2. Try to run and compare below and compare
without this patches. The former uses only KWSet and the later uses only
DFA. Later is faster on Linux x86, because DFA uses increment instrument
for shift of text pointer. It may generate a wrong usage among users.
I think that it should be avoided.
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
$ env LANG=C time -p src/grep jk k
$ env LANG=C time -p src/grep -i jk k
> sometimes the former is more important than the latter, and this may be
> one of those times.
I see that grep attaches a high value to performance. It uses not only
regex but also DFA. Further more, DFA has the specific codes for utf-8
locale, which is used most frequently. I think that grep may also have
specific codes for x86 processors, which are also used most frequently.
Norihiro
[patch.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Fri, 11 Apr 2014 00:00:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> sometimes the former is more important than the latter, and this may be one of those times.
I also like simple, and I don't like so much platform specific optimization.
However, I confirmed 10% speed-up with wikipedia database and the simple
word `Wikipedia'. I think that 10% speed-up cannot ignorable on most
frequently used platform and in most simple and frequently used usage.
$ env LANG=C time -p src/grep Wikipedia pages-articles.xml
http://dumps.wikimedia.org/jawiki/latest/
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Wed, 23 Apr 2014 07:10:03 GMT)
Full text and
rfc822 format available.
Message #26 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> Now I wrote the additional patch.
Sorry, I don't understand that patch. I applied your earlier patch
(attachment 1) and then merged that patch (attachment 2), but the
resulting 'grep' failed 'make check'; I got a "FAIL: fedora" and the
test after "file" never finished. This was on Fedora 20 x86-64 with my
own installation of GCC 4.9.0.
Instead of trying to understand that patch, I wrote a different patch
(attachment 3) that simplifies the code and removes the part that is
x86-specific. This improves the performance quite a bit for the test
case given in the ChangeLog entry. I also tested performance on Sparc
Solaris 10, and memchr was a big win there. Perhaps there are platforms
where memchr is a big loss, but we can cross that bridge if we come to it.
Perhaps I merged that patch incorrectly, so I'll leave this bug open for
now. To be honest I hope that patch doesn't improve performance
significantly if we get it to work correctly, as it looks tricky.
[0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch (text/plain, attachment)]
[0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch (text/plain, attachment)]
[0002-kwset-simplify-and-speed-up-Boyer-Moore-unibyte-i-in.patch (text/plain, attachment)]
Added tag(s) moreinfo.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Wed, 23 Apr 2014 07:13:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Wed, 23 Apr 2014 17:52:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> This improves the performance quite a bit for the test case given in
> the ChangeLog entry. I also tested performance on Sparc Solaris 10,
> and memchr was a big win there.
You are right. I also ran the tests on Solaris 10 and HP-UX 11iv2.
memchr() was win on both machines. Especially on HP-UX 11iv2, memchr was
10x faster than delta1 searching.
By the way, could you also test below for master and original grep-2.18?
$ yes abcdabc | head -50000000 >../k
$ env LANG=C time -p src/grep abcd.bd ../k
Perhaps, later will be faster.
0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch will fix
it. delta2 searching is higher cost than mind2 searching in original
grep-2.18. We need to reduce it for delta2 searching.
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 24 Apr 2014 06:34:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
> could you also test below for master and original grep-2.18?
>
> $ yes abcdabc | head -50000000 >../k
> $ env LANG=C time -p src/grep abcd.bd ../k
>
> Perhaps, later will be faster.
Yes, grep 2.18 is a bit faster on that benchmark for me; it's about 3.6s
real-time, whereas the master is about 4.2s.
> 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch will fix
> it. delta2 searching is higher cost than mind2 searching in original
> grep-2.18. We need to reduce it for delta2 searching.
Unfortunately it doesn't help for me; it causes the same benchmark to
take about 4.3s real-time. Here, I am talking about the version
resulting from applying the patch in
<http://debbugs.gnu.org/cgi/bugreport.cgi?msg=22;filename=17230.diff;att=1;bug=17230>
to the master.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Thu, 24 Apr 2014 12:31:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Unfortunately it doesn't help for me; it causes the same benchmark to
> take about 4.3s real-time. Here, I am talking about the version
> resulting from applying the patch in
> <http://debbugs.gnu.org/cgi/bugreport.cgi?msg=22;filename=17230.diff;att=1;bug=17230> to the master.
Sorry, the improvement for `abcd.bd' is
not 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
but 0001-kwset-simplify-Boyer-Moore-with-unibyte-i.patch.
Therefore, Ignore this case.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Fri, 25 Apr 2014 16:15:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 17229 <at> debbugs.gnu.org (full text, mbox):
grep 2.18 is slow for below, because always d == 1.
$ env LANG=C src/grep jk k
Therefore, I wrote 0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch.
When `d' is small, speeds up. memchr() is faster than or as fast as
delta1 search even when `d' is sufficiently large.
However, this patch can't apply to case-insensitive matching.
In fast, when `d' is large as following case, memchr_trans() imitated
memchr() will occur slowdown.
$ env LANG=C src/grep -i kkkkkkkkkkkkkkkkkkkkl k
So my patch works for only case-sensitive matching. By the way, for below
it's still slow with my patch. Especially, on x86 machines, it's slower
than grep 2.18.
$ env LANG=C src/grep -i jk k
I think that the master should be faster than original version (grep 2.18),
which uses not kwset but DFA for it.
Accordingly, I added 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
I noticed that DFA uses increments to proceed text pointer, and found
that the repeat of `++tp;' is faster than tp += d on x86 machines, when
`d' is small. `++tp; ++tp; ++tp' is faster than `tp += d;' on it.
This patch inprements it. I looked at the performance with below.
$ env LANG=C src/grep -i jk k
$ env LANG=C src/grep -i jkk k
$ env LANG=C src/grep -i jkkk k
$ env LANG=C src/grep -i jkkkk k
$ env LANG=C src/grep -i jkkkkk k
$ env LANG=C src/grep -i jkkkkkk k
$ env LANG=C src/grep -i jkkkkkkk k
$ env LANG=C src/grep -i jkkkkkkkk k
$ env LANG=C src/grep -i jkkkkkkkkk k
When `d' is greater than 8, It uses tp += d so that it's as fast as
repeat of `++tp;'. OTOH, when `d' is less than or equal to 8, uses the
repeat of `++tp', so that it's as fast as or faster than `tp += d'.
I used some macros in my patches, because I don't know how to replace it
to other expression without slowdown.
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Fri, 25 Apr 2014 16:23:02 GMT)
Full text and
rfc822 format available.
Message #43 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/25/2014 10:14 AM, Norihiro Tanaka wrote:
> grep 2.18 is slow for below, because always d == 1.
>
> $ env LANG=C src/grep jk k
>
> Therefore, I wrote 0001-grep-speed-up-by-using-memchr-in-Boyer-Moore-searchi.patch.
>
> When `d' is small, speeds up. memchr() is faster than or as fast as
> delta1 search even when `d' is sufficiently large.
>
> However, this patch can't apply to case-insensitive matching.
> In fast, when `d' is large as following case, memchr_trans() imitated
> memchr() will occur slowdown.
Gnulib includes a memchr2() interface, which efficiently searches for
one of two byte values across a known memory length. It is not quite as
fast as optimized assembly memchr for a single byte search, but when
searching for two bytes in parallel, it is hands down faster than two
sequential memchr() operations or any naive byte-by-byte comparisons. I
suspect that using memchr2() for case-insensitive searches may allow you
a speedup when searching for (the first byte of) two potential matches
in the search string to the first character of a case-insensitive pattern.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Sat, 26 Apr 2014 00:27:01 GMT)
Full text and
rfc822 format available.
Message #46 received at 17229 <at> debbugs.gnu.org (full text, mbox):
Eric Blake wrote:
> It is not quite as fast as optimized assembly memchr for a single byte
> search, but when searching for two bytes in parallel, it is hands down
> faster than two sequential memchr() operations or any naive byte-by-byte
> comparisons.
Thanks, I look at the performance and compare them. memchr2() is the
fastest for case-insensitive except for large d Solaris or HP-UX.
However, kwset doesn't know that `trans' doesn't have counterparts more
than one for any character. So we need to check it before run memchr2().
<< Linux 86-64 >>
memchr2 : real 0.55 user 0.07 sys 0.48
memchr_trans : real 0.71 user 0.14 sys 0.57
my patch small d: real 1.06 user 0.57 sys 0.48
my patch large d: real 0.54 user 0.01 sys 0.52
DFA : real 1.45 user 0.91 sys 0.53
<< Solaris 10 >>
memchr2 : real 3.17 user 2.45 sys 0.71
memchr_trans : real 4.16 user 3.45 sys 0.70
my patch small d: real 6.43 user 5.71 sys 0.71
my patch large d: real 1.29 user 0.57 sys 0.71
DFA : real 11.08 user 10.35 sys 0.71
<< HP-UX ia64 >>
memchr2 : real 0.9 user 0.6 sys 0.2
memchr_trans : real 2.5 user 2.3 sys 0.2
my patch small d: real 2.1 user 1.9 sys 0.2
my patch large d: real 0.3 user 0.1 sys 0.2
DFA : real 3.2 user 3.0 sys 0.2
For small d:
$ env LANG=C time -p src/grep -i jk ../k
For large d:
$ env LANG=C time -p src/grep -i kkkkkkkkkkkkkkkkkkkk ../k
For DFA:
$ env LANG=C time -p src/grep -i 'k\|l' ../k
By the way, 0001-grep-speed-up-by-replacing-incr-to-add-in-x86-and-x8.patch
is effective still, even when we use memchr2(). It's useful for below,
which doesn't reach at memchr2() to run alternately with delta1 searching
and delta2 searching.
$ yes kjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkjkj | head -10000000 >k
$ env LANG=C src/grep -i jj k
before: real 0.85 user 0.64 sys 0.21
after : real 0.54 user 0.29 sys 0.25
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Sat, 26 Apr 2014 08:45:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I submit the patch to use memchr2() for case-insensitive matching in
bmexec(). It counts last characters of the patterns in kwsprep(), if
it's just two, use memchr2(). The new version uses memchr() still, if
last character of the pattern is non-alphabetical.
Norihiro
[patch.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Sun, 27 Apr 2014 04:05:02 GMT)
Full text and
rfc822 format available.
Message #52 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks for the test cases and patch. In my tests, switching to macros
does not help performance, and that SWITCH macro's implementation
actually slows things down a bit, which is what I'd expect. If there is
a reason to use macros I'd like to see a patch that simply changes
functions to macros without changing the algorithm, so that we can
measure this effect separately from the algorithm change.
I attempted to suss out the performance improvements in that patch
without using macros, and installed the attached changes. With these
changes grep performs about as well as it does with that patch, on the
benchmarks you mentioned that I tried (as before, I'm using the default
optimization with GCC 4.9.0 x86-64 on an AMD Phenom II X4 910e). Quite
possibly I've missed something, of course. The two "advance_*"
constants used in the heuristics are guesses: I haven't measured
rigorously to try to come up with good values.
[0001-kwset-improve-performance-when-large-Boyer-Moore-key.patch (text/plain, attachment)]
[0002-kwset-speed-up-by-using-memchr2.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Sun, 27 Apr 2014 07:58:01 GMT)
Full text and
rfc822 format available.
Message #55 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks for the improvement with memchr2(). It's very great!
Paul Eggert wrote:
> If there is a reason to use macros I'd like to see a patch that simply
> changes functions to macros without changing the algorithm, so that we
> can measure this effect separately from the algorithm change.
I tested with below.
yes jjjjjjjjjjjjjjjjjjjj | head -10000000 >k
env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
The performance with the current master is as following.
real 2.95 user 2.44 sys 0.45
After changes function to macros.
real 1.09 user 0.59 sys 0.43
Could you try above cases?
Norihiro
[patch.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Sun, 27 Apr 2014 20:34:02 GMT)
Full text and
rfc822 format available.
Message #58 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> Could you try above cases?
Thanks, you're observing a 2.7x performance speedup with macros on your
platform and your benchmark. With the same patch, I observed only a
1.18x speedup on the same benchmark. As usual, I'm testing with AMD
Phenom II X4 910e + GCC 4.9.0 + Fedora 20 + default (-O2) optimization.
I'm curious about why you're observing a much bigger performance
difference with macros. What platform are you using?
Anyway, an 18% speedup is still a speedup, so I looked into it. GCC
4.9.0 misses a non-obvious opportunity for function inlining. I
installed a tweak (attached) that should make the inlining opportunity
obvious to compilers nowadays. On my platform this gave a 28% speedup,
i.e., a bit better than the macro-using patch would have.
[0001-kwset-improve-performance-by-inlining-more.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Mon, 28 Apr 2014 04:07:02 GMT)
Full text and
rfc822 format available.
Message #61 received at 17229 <at> debbugs.gnu.org (full text, mbox):
On Sun, Apr 27, 2014 at 1:32 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Anyway, an 18% speedup is still a speedup, so I looked into it. GCC 4.9.0
> misses a non-obvious opportunity for function inlining. I installed a tweak
> (attached) that should make the inlining opportunity obvious to compilers
> nowadays. On my platform this gave a 28% speedup, i.e., a bit better than
> the macro-using patch would have.
Very nice. I measured a 34% improvement on an i7-3615QM with -O2 and
gcc built from git (4.10.0 20140424)
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Mon, 28 Apr 2014 12:47:01 GMT)
Full text and
rfc822 format available.
Message #64 received at 17229 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> Anyway, an 18% speedup is still a speedup, so I looked into it.
> GCC 4.9.0 misses a non-obvious opportunity for function inlining. I
> installed a tweak (attached) that should make the inlining opportunity
> obvious to compilers nowadays. On my platform this gave a 28% speedup,
> i.e., a bit better than the macro-using patch would have.
You are right. My compiler was too old. It was GCC 4.1.2 on CentOS 5.10.
I retried it with GCC 4.4.7, and got the good performance.
# Although I tried to build GCC 4.9.0, it hasn't carried out well yet.
By the way, I examined the reason why it was slow on GCC 4.1.2, and I
found that tr() isn't inlining without `-finline-loops' option, because
`-finline-small-functions' option can be used from GCC 4.3. Although I
submit the patch, it mayn't be so important.
Thanks,
Norihiro
[patch.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17229
; Package
grep
.
(Wed, 30 Apr 2014 01:39:01 GMT)
Full text and
rfc822 format available.
Message #67 received at 17229 <at> debbugs.gnu.org (full text, mbox):
> Subject: [PATCH] kwset: improve the performance by inlining tr
Thanks. Normally we don't add 'inline' to functions, under the theory
that compilers inline well-enough nowadays that we'd rather trust their
judgment, but I suppose CentOS 5 is still relevant-enough and the
performance effects great enough that we can make an exception here.
Plus, we're already using 'inline' elsewhere in this module for
performance reasons, and it's unlikely that a non-inlined 'tr' would
perform better on any current platform. So, I installed that patch.
Reply sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
You have taken responsibility.
(Wed, 30 Apr 2014 23:12:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Wed, 30 Apr 2014 23:12:03 GMT)
Full text and
rfc822 format available.
Message #72 received at 17229-done <at> debbugs.gnu.org (full text, mbox):
Thanks. Closing, all requests suggested in this bug was considered.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 29 May 2014 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 359 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.