GNU bug report logs -
#17700
[PATCH] dfa: speed-up for a pattern that many atoms are catenated
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Thu, 5 Jun 2014 11:41:01 UTC
Severity: normal
Tags: patch
Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17700 in the body.
You can then email your comments to 17700 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Thu, 05 Jun 2014 11:41:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Thu, 05 Jun 2014 11:41:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
When many atoms are catenated, dfamust() is very slow in order that
pushing a string into `in' list is slow. This change fixes it.
I tested below to confirm the effect.
$ printf '%08192d\n' 0 | time -p src/grep -f - /dev/null
[0001-dfa-speed-up-for-a-pattern-that-many-atoms-are-caten.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Thu, 05 Jun 2014 16:50:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17700 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 06/05/2014 04:32 AM, Norihiro Tanaka wrote:
> + memchr (cp, *lookfor, lenin - (cp - lookin));
> + if (!cp)
Thanks, but this part can't be right, as memchr's result is discarded.
It seems to me that much of the performance benefit comes from using a
faster implementation of strstr, and that the DFA code will be better
off if it simply uses the system strstr rather than rolling its own.
(The DFA code dates back before strstr was standardized, which is why it
has its own implementation.)
I installed the attached patch to do that and got a big speedup:
$ printf '%08192d\n' 0 | time -p src/old/grep -f - /dev/null
real 16.14
user 16.13
sys 0.00
$ printf '%08192d\n' 0 | time -p src/grep -f - /dev/null
real 0.79
user 0.79
sys 0.00
Could you please look at the remaining part of your patch and see
whether it's a win if it's merged to what's now installed? Thanks.
PS. Aharon, I assume this'll affect Gawk, in that you'll need to
provide a strstr if you want to be portable to ancient systems that lack
it. strstr was standardized in C89 so it'd have to be a pretty ancient
system, and it may be better just to let this slide.
[diff.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Thu, 05 Jun 2014 22:50:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Thanks.
Paul Eggert wrote:
> It seems to me that much of the performance benefit comes from using a
> faster implementation of strstr, and that the DFA code will be better
> off if it simply uses the system strstr rather than rolling its own.
`lookfor' isn't terminated by `\-'\0' in istrstr(), but strstr() requires
it. So I wasn't replace it to strstr();
after `grep: undo part of previous change', no longer faster.
$ printf '%02048d\n' 0 | time -p src/grep -f - /dev/null
master : real 5.74 user 5.40 sys 0.17
my patch: real 0.08 user 0.04 sys 0.04
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 00:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
> after `grep: undo part of previous change', no longer faster.
>
> $ printf '%02048d\n' 0 | time -p src/grep -f - /dev/null
>
> master : real 5.74 user 5.40 sys 0.17
> my patch: real 0.08 user 0.04 sys 0.04
This isn't matching the results I get on my platform. If
src/d5dfa69/grep is the old version, src/709e7e5/grep the current master
(i.e., just use system strstr), and src/grep is the old version with
your patch, I get:
$ printf '%02048d\n' 0 | time -p src/d5dfa69/grep -f - /dev/null
real 0.33
user 0.33
sys 0.00
$ printf '%02048d\n' 0 | time -p src/709e7e5/grep -f - /dev/null
real 0.04
user 0.04
sys 0.00
$ printf '%02048d\n' 0 | time -p src/grep -f - /dev/null
real 0.03
user 0.03
sys 0.00
So it looks like your patch confers some advantage, but on my platform
almost all the speedup is achieved simply by switching to the system strstr.
> `lookfor' isn't terminated by `\-'\0' in istrstr(), but strstr() requires it.
Yes, that's why I needed to make the most recent change in the current
master; it arranges for strstr's 2nd arg to be null-terminated.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 06:44:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Is strstr() even a good idea? dfa needs to be able to match NUL
bytes in the data. If this prevents that, then it's a problem.
I'm merely asking - I didn't look hard at where the change was made.
If it matching NUL bytes isn't affected then, no problem.
Thanks,
Arnold
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
> > after `grep: undo part of previous change', no longer faster.
> >
> > $ printf '%02048d\n' 0 | time -p src/grep -f - /dev/null
> >
> > master : real 5.74 user 5.40 sys 0.17
> > my patch: real 0.08 user 0.04 sys 0.04
>
> This isn't matching the results I get on my platform. If
> src/d5dfa69/grep is the old version, src/709e7e5/grep the current master
> (i.e., just use system strstr), and src/grep is the old version with
> your patch, I get:
>
> $ printf '%02048d\n' 0 | time -p src/d5dfa69/grep -f - /dev/null
> real 0.33
> user 0.33
> sys 0.00
> $ printf '%02048d\n' 0 | time -p src/709e7e5/grep -f - /dev/null
> real 0.04
> user 0.04
> sys 0.00
> $ printf '%02048d\n' 0 | time -p src/grep -f - /dev/null
> real 0.03
> user 0.03
> sys 0.00
>
> So it looks like your patch confers some advantage, but on my platform
> almost all the speedup is achieved simply by switching to the system strstr.
>
> > `lookfor' isn't terminated by `\-'\0' in istrstr(), but strstr() requires it.
>
> Yes, that's why I needed to make the most recent change in the current
> master; it arranges for strstr's 2nd arg to be null-terminated.
Reply sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
You have taken responsibility.
(Fri, 06 Jun 2014 13:20:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Fri, 06 Jun 2014 13:20:03 GMT)
Full text and
rfc822 format available.
Message #22 received at 17700-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> So it looks like your patch confers some advantage, but on my platform
> almost all the speedup is achieved simply by switching to the system strstr.
First, I tested on CentOS 5.10. Next, I tested on RHEL 6.5, and get
result as same as you. strstr() on CentOS 5.10 may be too old. I'm
sure that this bug has already been fixed. So closing.
Thanks,
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 13:44:01 GMT)
Full text and
rfc822 format available.
Message #25 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Arnold Robbins wrote:
> Is strstr() even a good idea? dfa needs to be able to match NUL
> bytes in the data. If this prevents that, then it's a problem.
dfamust() doesn't change a result, so no problem. BTW, even before make
this change, dfamusts is terminated by NUL byte.
Thanks,
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 14:21:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 17700 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> strstr() on CentOS 5.10 may be too old.
Yes it is. But 'configure' is supposed to detect this. config.log
should say something like this:
configure:26283: checking whether strstr works in linear time
configure:26357: gcc -std=gnu99 -o conftest -g -O2 conftest.c >&5
configure:26357: $? = 0
configure:26357: ./conftest
configure:26357: $? = 142
configure: program exited with status 142
This is how it behaves for me, on RHEL 6.5 and on Solaris 11.1. Could
you please investigate why it is not occurring on CentOS 5.10? For
example, why does the attached program work? It should fail.
[strstr-test.c (text/x-csrc, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 14:27:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
> BTW, even before make
> this change, dfamusts is terminated by NUL byte.
Yes, that's right, this patch doesn't affect whether the DFA code works
correctly on NUL bytes. There is a performance issue: if the pattern
contains NUL bytes the DFA code doesn't operate as efficiently as it
will does the pattern contains only non-NUL bytes. But this performance
issue is unaffected by the recent change.
It's low priority to fix the performance issue, but if we wanted to fix
it among many other things we should use memmem instead of strstr.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17700
; Package
grep
.
(Fri, 06 Jun 2014 22:47:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 17700 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> Could you please investigate why it is not occurring on CentOS 5.10?
> For example, why does the attached program work? It should fail.
Sorry, I don't update gnulib. After update, it was returned immediately.
I have also confirmed speed-up in grep.
Thanks,
Norihiro
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 05 Jul 2014 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 291 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.