GNU bug report logs - #20006
Get rid of excessive sed forks

Previous Next

Package: libtool;

Reported by: Harald Hoyer <harald <at> redhat.com>

Date: Thu, 5 Mar 2015 08:59:03 UTC

Severity: normal

Merged with 20005

Fixed in versions 2.4.6.19-f187, 2.4.6.19-aabc

Done: Pavel Raiskup <praiskup <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20006 in the body.
You can then email your comments to 20006 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Thu, 05 Mar 2015 08:59:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Harald Hoyer <harald <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-libtool <at> gnu.org. (Thu, 05 Mar 2015 08:59:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Harald Hoyer <harald <at> redhat.com>
To: bug-libtool <at> gnu.org
Subject: Re: Get rid of excessive sed forks
Date: Thu, 05 Mar 2015 09:52:05 +0100
[Message part 1 (text/plain, inline)]
On 05.03.2015 09:29, Harald Hoyer wrote:
> See
> https://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-180000-sed-forks/
> 
> Patch attached.
> 

Corrected patch attached
[libtool-sed.patch (text/x-diff, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Sun, 04 Oct 2015 22:46:01 GMT) Full text and rfc822 format available.

Message #8 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: libtool <at> gnu.org
Cc: Eric Blake <eblake <at> redhat.com>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 05 Oct 2015 00:45:50 +0200
[Message part 1 (text/plain, inline)]
forcemerge 20006 20005
thanks

On Monday 09 of March 2015 18:04:34 Mike Frysinger wrote:
> On 09 Mar 2015 14:48, Eric Blake wrote:
> > On 03/09/2015 01:50 PM, Bob Friesenhahn wrote:
> > > On Mon, 9 Mar 2015, Mike Gran wrote:
> > >> I don't know if y'all saw this blogpost where a guy pushed
> > >> the sed regular expression handling into bash-specific
> > >> regular expressions when bash was available.  He claims
> > >> there's a significant performance improvement because of
> > >> reduced forking.
> > >>
> > >> http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-180000-sed-forks/
> > > 
> > > There is an issue in the libtool bug tracker regarding this.
> > > 
> > > This solution only works with GNU bash.  It would be good if volunteers
> > > could research to see if there are similar solutions which can work with
> > > other common shells (e.g. dash, ksh, zsh).
> > 
> > For context, we're trying to speed up:
> > 
> > sed_quote_subst='s|\([`"$\\]\)|\\\1|g'
> > _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"`
> > 
> > How about this, which should be completely portable to XSI shells (alas,
> > it still uses ${a#b} and ${a%b} at the end, so it is not portable to
> > ancient Solaris /bin/sh):
> > 
> > # func_quote STRING
> > # Escapes all \`"$ in STRING with another \, and stores that in $quoted
> > func_quote () {
> >   case $1 in
> >     *[\\\`\"\$]*)
> >       save_IFS=$IFS pre=.$1.
> >       for char in '\' '`' '"' '$'; do
> >         post= IFS=$char
> >         for part in $pre; do
> >           post=${post:+$post\\$char}$part
> >         done
> >         pre=$post
> >       done
> 
> should we test the size of the string first ?  i've written such raw shell 
> string parsing functions before, and once you hit a certain size (like 1k+ 
> iirc), forking out to sed is way faster, especially when running in multibyte 
> locales (like UTF8) which most people are doing nowadays.
> -mike

Well, that optimization would require (fast) strlen()-like construct.
Anyway, the vast majority of calls to func_quote () function will have
short ARG, and its complexity is still "just" linear.  We could optimize
later if that was a real issue.

I would like to propose solution based on Eric's one, without using of
'${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
more portable while it keeps almost the same speed (if we can use += its
even faster).

I have yet a another patch trying to minimize option-parser overhead
(that is focused on the POV of Richard, but that needs to be cleaned up a
bit, I'll post hopefully tomorrow).

Any comment is welcome!
Pave
[0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch (text/x-patch, attachment)]

Forcibly Merged 20005 20006. Request was from Pavel Raiskup <praiskup <at> redhat.com> to control <at> debbugs.gnu.org. (Sun, 04 Oct 2015 22:47:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Sun, 04 Oct 2015 23:26:02 GMT) Full text and rfc822 format available.

Message #13 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: libtool <at> gnu.org
Cc: Eric Blake <eblake <at> redhat.com>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 05 Oct 2015 01:25:24 +0200
[Message part 1 (text/plain, inline)]
On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote:
> > should we test the size of the string first ?  i've written such raw shell 
> > string parsing functions before, and once you hit a certain size (like 1k+ 
> > iirc), forking out to sed is way faster, especially when running in multibyte 
> > locales (like UTF8) which most people are doing nowadays.
> > -mike
> 
> Well, that optimization would require (fast) strlen()-like construct.
> Anyway, the vast majority of calls to func_quote () function will have
> short ARG, and its complexity is still "just" linear.  We could optimize
> later if that was a real issue.
> 
> I would like to propose solution based on Eric's one, without using of
> '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
> more portable while it keeps almost the same speed (if we can use += its
> even faster).
> 
> I have yet a another patch trying to minimize option-parser overhead
> (that is focused on the POV of Richard, but that needs to be cleaned up a
> bit, I'll post hopefully tomorrow).
> 
> Any comment is welcome!

Re-attached (fixes for 'make syntax-check' and fixed one comment).

Pavel
[0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch (text/x-patch, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Mon, 05 Oct 2015 07:48:01 GMT) Full text and rfc822 format available.

Message #16 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: libtool <at> gnu.org
Cc: Eric Blake <eblake <at> redhat.com>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 05 Oct 2015 09:47:05 +0200
On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote:
> On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote:
> > > should we test the size of the string first ?  i've written such raw shell 
> > > string parsing functions before, and once you hit a certain size (like 1k+ 
> > > iirc), forking out to sed is way faster, especially when running in multibyte 
> > > locales (like UTF8) which most people are doing nowadays.
> > > -mike
> > 
> > Well, that optimization would require (fast) strlen()-like construct.
> > Anyway, the vast majority of calls to func_quote () function will have
> > short ARG, and its complexity is still "just" linear.  We could optimize
> > later if that was a real issue.
> > 
> > I would like to propose solution based on Eric's one, without using of
> > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
> > more portable while it keeps almost the same speed (if we can use += its
> > even faster).
> > 
> > I have yet a another patch trying to minimize option-parser overhead
> > (that is focused on the POV of Richard, but that needs to be cleaned up a
> > bit, I'll post hopefully tomorrow).
> > 
> > Any comment is welcome!
> 
> Re-attached (fixes for 'make syntax-check' and fixed one comment).

Hmm, one might-be-a-problem with this (catched by testsuite), when you
have:

  $ cat build-aux/test-quoting
  . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh
  # source this for "GNU m4" detection methods
  . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace

  func_quote_for_eval "$@"
  echo "$func_quote_for_eval_result"

Then:

  $ ./build-aux/test-quoting '"a b"' # fine
  "\"a b\""

  $ ./build-aux/test-quoting '"*tool"' # broken
  ./build-aux/test-quoting '"*tool"'
  \"libtool\"

We would like to have an output \"*\".  I'm not aware of portable way
how to disable wildcard expansion in shell, and autoconf 'Shellology'
section haven't helped me.  In particular, the problem is here:

  x='a"[a-z]*"c'
  IFS='"'
  for i in $x; do # Here we wan't to disable wildcard expansion
    echo $i
  done

Any idea other than fallback to $sed_quote_subst in case of '*' or '['
exists in ARG?

Pavel





Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Mon, 05 Oct 2015 13:30:08 GMT) Full text and rfc822 format available.

Message #19 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: libtool <at> gnu.org
Cc: Eric Blake <eblake <at> redhat.com>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 05 Oct 2015 15:28:56 +0200
[Message part 1 (text/plain, inline)]
On Monday 05 of October 2015 09:47:05 Pavel Raiskup wrote:
> On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote:
> > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote:
> > > > should we test the size of the string first ?  i've written such raw shell 
> > > > string parsing functions before, and once you hit a certain size (like 1k+ 
> > > > iirc), forking out to sed is way faster, especially when running in multibyte 
> > > > locales (like UTF8) which most people are doing nowadays.
> > > > -mike
> > > 
> > > Well, that optimization would require (fast) strlen()-like construct.
> > > Anyway, the vast majority of calls to func_quote () function will have
> > > short ARG, and its complexity is still "just" linear.  We could optimize
> > > later if that was a real issue.
> > > 
> > > I would like to propose solution based on Eric's one, without using of
> > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
> > > more portable while it keeps almost the same speed (if we can use += its
> > > even faster).
> > > 
> > > I have yet a another patch trying to minimize option-parser overhead
> > > (that is focused on the POV of Richard, but that needs to be cleaned up a
> > > bit, I'll post hopefully tomorrow).
> > > 
> > > Any comment is welcome!
> > 
> > Re-attached (fixes for 'make syntax-check' and fixed one comment).
> 
> Hmm, one might-be-a-problem with this (catched by testsuite), when you
> have:
> 
>   $ cat build-aux/test-quoting
>   . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh
>   # source this for "GNU m4" detection methods
>   . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace
> 
>   func_quote_for_eval "$@"
>   echo "$func_quote_for_eval_result"
> 
> Then:
> 
>   $ ./build-aux/test-quoting '"a b"' # fine
>   "\"a b\""
> 
>   $ ./build-aux/test-quoting '"*tool"' # broken
>   ./build-aux/test-quoting '"*tool"'
>   \"libtool\"
> 
> We would like to have an output \"*\".  I'm not aware of portable way
> how to disable wildcard expansion in shell, and autoconf 'Shellology'
> section haven't helped me.  In particular, the problem is here:
> 
>   x='a"[a-z]*"c'
>   IFS='"'
>   for i in $x; do # Here we wan't to disable wildcard expansion
>     echo $i
>   done
> 
> Any idea other than fallback to $sed_quote_subst in case of '*' or '['
> exists in ARG?

Attaching two (yet to be cleaned) patches doing the optimization.  Is
anybody able to test/comment on this particular solution?  That would be
really appreciated.

Pavel
[0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch (text/x-patch, attachment)]
[0002-libbtool-optimize-parse-options-calls.patch (text/x-patch, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Mon, 05 Oct 2015 15:17:01 GMT) Full text and rfc822 format available.

Message #22 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pavel Raiskup <praiskup <at> redhat.com>, libtool <at> gnu.org
Cc: 20006 <at> debbugs.gnu.org, Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 5 Oct 2015 09:16:30 -0600
[Message part 1 (text/plain, inline)]
On 10/05/2015 01:47 AM, Pavel Raiskup wrote:

> 
> Hmm, one might-be-a-problem with this (catched by testsuite), when you

s/catched/caught/


> We would like to have an output \"*\".  I'm not aware of portable way
> how to disable wildcard expansion in shell, and autoconf 'Shellology'
> section haven't helped me.  In particular, the problem is here:

It is portable to use 'set +f' to disable wildcard globbing, then 'set
-f' to turn them back on.  (POSIX requires it).

> 
>   x='a"[a-z]*"c'
>   IFS='"'
>   for i in $x; do # Here we wan't to disable wildcard expansion

s/wan't/want/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Wed, 07 Oct 2015 12:29:01 GMT) Full text and rfc822 format available.

Message #25 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: libtool <at> gnu.org
Cc: Eric Blake <eblake <at> redhat.com>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Wed, 07 Oct 2015 14:28:30 +0200
[Message part 1 (text/plain, inline)]
On Monday 05 of October 2015 15:28:56 Pavel Raiskup wrote:
> On Monday 05 of October 2015 09:47:05 Pavel Raiskup wrote:
> > On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote:
> > > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote:
> > > > > should we test the size of the string first ?  i've written such raw shell 
> > > > > string parsing functions before, and once you hit a certain size (like 1k+ 
> > > > > iirc), forking out to sed is way faster, especially when running in multibyte 
> > > > > locales (like UTF8) which most people are doing nowadays.
> > > > > -mike
> > > > 
> > > > Well, that optimization would require (fast) strlen()-like construct.
> > > > Anyway, the vast majority of calls to func_quote () function will have
> > > > short ARG, and its complexity is still "just" linear.  We could optimize
> > > > later if that was a real issue.
> > > > 
> > > > I would like to propose solution based on Eric's one, without using of
> > > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
> > > > more portable while it keeps almost the same speed (if we can use += its
> > > > even faster).
> > > > 
> > > > I have yet a another patch trying to minimize option-parser overhead
> > > > (that is focused on the POV of Richard, but that needs to be cleaned up a
> > > > bit, I'll post hopefully tomorrow).
> > > > 
> > > > Any comment is welcome!
> > > 
> > > Re-attached (fixes for 'make syntax-check' and fixed one comment).
> > 
> > Hmm, one might-be-a-problem with this (catched by testsuite), when you
> > have:
> > 
> >   $ cat build-aux/test-quoting
> >   . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh
> >   # source this for "GNU m4" detection methods
> >   . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace
> > 
> >   func_quote_for_eval "$@"
> >   echo "$func_quote_for_eval_result"
> > 
> > Then:
> > 
> >   $ ./build-aux/test-quoting '"a b"' # fine
> >   "\"a b\""
> > 
> >   $ ./build-aux/test-quoting '"*tool"' # broken
> >   ./build-aux/test-quoting '"*tool"'
> >   \"libtool\"
> > 
> > We would like to have an output \"*\".  I'm not aware of portable way
> > how to disable wildcard expansion in shell, and autoconf 'Shellology'
> > section haven't helped me.  In particular, the problem is here:
> > 
> >   x='a"[a-z]*"c'
> >   IFS='"'
> >   for i in $x; do # Here we wan't to disable wildcard expansion
> >     echo $i
> >   done
> > 
> > Any idea other than fallback to $sed_quote_subst in case of '*' or '['
> > exists in ARG?
> 
> Attaching two (yet to be cleaned) patches doing the optimization.  Is
> anybody able to test/comment on this particular solution?  That would be
> really appreciated.

The cleaned patches are attached.  I would like to push those very soon,
probably before weekend.  If you see any issues worth holding this change,
please let me know soon, thanks!

FWIW, some numbers (systemd.git build time, right after 'make clean'):

The old libtool v2.4.2:
  $ time make -j5
  real    2m3.163s
  user    3m54.849s
  sys     3m28.684s

The latest released libtool v2.4.6:
  $ time make -j5 LIBTOOL=/usr/bin/libtool
  real    8m24.604s
  user    9m56.977s
  sys     19m45.620s

The patched git libtool:
  $ time make -j5 LIBTOOL=~/rh/projects/libtool/libtool
  real    2m34.682s
  user    6m37.158s
  sys     2m21.123s

.. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s.  It's not
completely back yet but it's much better than v2.4.6.

Pavel
[0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch (text/x-patch, attachment)]
[0002-libtool-optimizing-options-parser-hooks.patch (text/x-patch, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Wed, 07 Oct 2015 14:04:01 GMT) Full text and rfc822 format available.

Message #28 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pavel Raiskup <praiskup <at> redhat.com>, libtool <at> gnu.org
Cc: 20006 <at> debbugs.gnu.org, Richard Purdie <richard.purdie <at> linuxfoundation.org>,
 Mike Frysinger <vapier <at> gentoo.org>
Subject: Re: Bash-specific performance by avoiding sed
Date: Wed, 7 Oct 2015 08:03:01 -0600
[Message part 1 (text/plain, inline)]
On 10/07/2015 06:28 AM, Pavel Raiskup wrote:

> .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s.  It's not
> completely back yet but it's much better than v2.4.6.

Thanks again for working on this.

>  
> +# func_quote ARG
> +# --------------
> +# Aesthetically quote one ARG, store the result into $func_quote_result.  Note
> +# that we keep attention to performance here (so far O(N) complexity as long as
> +# func_append is O(1)).
> +func_quote ()
> +{
> +    $debug_cmd
> +
> +    func_quote_result=$1

Common case - nothing needs escaping.  Converts 'abc' to 'abc', as well
as 'a  b' to 'a  b'.  The caller can still blindly add outer "" for a
printed form '"abc"' (unnecessary but harmless ""), or for '"a  b"'
(necessary in that case).


> +
> +    case $func_quote_result in
> +      *[\\\`\"\$]*)

Something needs escaping before being placed in double quotes.

> +        case $func_quote_result in
> +          *'*'*|*'['*)

Problem: shouldn't this also include *'?'* to avoid globbing a single
character?

Could probably be written *[\[\*\?]*)

The string itself contains globs, so...

> +            func_quote_result=`$ECHO "$func_quote_result" | $SED "$sed_quote_subst"`
> +            return 0

...rather than worry about set +f, we just use sed in this rare case.
Converts 'a*b' into 'a*b', converts 'a*"b' into 'a*\"b'.  The caller
then supplies outer "", for '"a*b"' or '"a*\"b"' (outer quotes necessary
in both cases).

> +            ;;
> +        esac
> +
> +        func_quote_old_IFS=$IFS
> +        for _G_char in '\' '`' '"' '$'
> +        do
> +          # STATE($1) PREV($2) SEPARATOR($3)
> +          set start "" ""
> +          func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy
> +          IFS=$_G_char
> +          for _G_part in $func_quote_result
> +          do
> +            case $1 in
> +            quote)
> +              func_append func_quote_result "$3$2"
> +              set quote "$_G_part" "\\$_G_char"
> +              ;;
> +            start)
> +              set first "" ""
> +              func_quote_result=
> +              ;;
> +            first)
> +              set quote "$_G_part" ""
> +              ;;
> +            esac
> +          done
> +          IFS=$func_quote_old_IFS
> +        done
> +        ;;

The string does not contain globs, so do four linear passes that escape
each problem character.

converts 'a"b' into 'a\"b', then caller adds outer quotes to become
'"a\"b"'.

> +      *) ;;
> +    esac
> +}

Looks correct.  However, is it also worth attempting a shell-specific
speedup?  After all, func_append uses shell-specific speedups (it is NOT
as fast on shells that don't support +=).

That is, the above function appears to be portable to all shells, but if
we detect that a shell supports printf -v %q, can we use that instead
for some additional speed?  Of course, printf -v %q in bash prefers
output that does NOT embed inside "" (it quotes ALL shell metacharacters
using backslash), so we'd have to rework the contract of the function.

That is, instead of the current function which returns a ready-escaped
string but no outer "", we would instead be returning a string that
already includes all necessary quoting.  Which would mean rewriting all
callers :(  For example, we'd have to figure out what to do for
func_quote_for_eval, which returns two values -
func_quote_for_eval_result being fully quoted is easy with printf -v %q,
and func_quote_for_eval_unquoted_result which is not.

Here's a (lightly-tested) idea of what it would look like, where we'd
have to audit every caller to deal with the result already including
full quoting:

if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then
  func_quote()
  {
    printf -v func_quote_result %q "$1"
  }
else
  func_quote()
  {
    portable version, except add:
    func_quote_result="\"$func_quote_result\""
  }
fi

Note that with this variant, the portable version converts 'a  *"b' into
'"a  *\"b"', while the bash version converts it into 'a\ \ \*\"b'.

> @@ -8596,8 +8597,8 @@ EOF
>  	    relink_command="$var=$func_quote_for_eval_result; export $var; $relink_command"
>  	  fi
>  	done
> -	relink_command="(cd `pwd`; $relink_command)"
> -	relink_command=`$ECHO "$relink_command" | $SED "$sed_quote_subst"`
> +	func_quote "(cd `pwd`; $relink_command)"
> +	relink_command=$func_quote_result

Unrelated to your patch, but doesn't this fail if pwd is called in an
absolute directory containing spaces?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Fri, 09 Oct 2015 15:17:02 GMT) Full text and rfc822 format available.

Message #31 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: Mike Frysinger <vapier <at> gentoo.org>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>, libtool <at> gnu.org
Subject: Re: Bash-specific performance by avoiding sed
Date: Fri, 09 Oct 2015 17:16:03 +0200
On Wednesday 07 of October 2015 08:03:01 Eric Blake wrote:
> On 10/07/2015 06:28 AM, Pavel Raiskup wrote:
> 
> > .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s.  It's not
> > completely back yet but it's much better than v2.4.6.
> 
> Thanks again for working on this.

Thanks for perfect review, Eric.

> > +        case $func_quote_result in
> > +          *'*'*|*'['*)
> 
> Problem: shouldn't this also include *'?'* to avoid globbing a single
> character?

Oh, sorry for this - and thanks, I'll fix that!

> Could probably be written *[\[\*\?]*)
> 
> The string itself contains globs, so...

Ok.

> > +            ;;
> > +        esac
> > +
> > +        func_quote_old_IFS=$IFS
> > +        for _G_char in '\' '`' '"' '$'
> > +        do
> > +          # STATE($1) PREV($2) SEPARATOR($3)
> > +          set start "" ""
> > +          func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy
> > +          IFS=$_G_char
> > +          for _G_part in $func_quote_result
> > +          do
> > +            case $1 in
> > +            quote)
> > +              func_append func_quote_result "$3$2"
> > +              set quote "$_G_part" "\\$_G_char"
> > +              ;;
> > +            start)
> > +              set first "" ""
> > +              func_quote_result=
> > +              ;;
> > +            first)
> > +              set quote "$_G_part" ""
> > +              ;;
> > +            esac
> > +          done
> > +          IFS=$func_quote_old_IFS
> > +        done
> > +        ;;
> 
> The string does not contain globs, so do four linear passes that escape
> each problem character.

To be honest, in shells without '+=' operator, these are four O(N^2)
passes..  But still very acceptable parabola IMO.

Firstly I tried to do this without this ugly state machine -- using $@
array to store strings.  Very roughly:

  for _G_char in '\' '`' ...
  do
    IFS=$_G_char
    set dummy
    for _G_part $func_quote_result
    do
      set "$@" "\\$_G_char$_G_part"
      ...
    done
    shift
    # _now join $@ into one string_
  done

... but that ended in quadratic complexity _everywhere_.  I was curious
why and it was because of the 'set "$@"' command being longer and longer
for strings like '"""""""""""""""""""...'.

> converts 'a"b' into 'a\"b', then caller adds outer quotes to become
> '"a\"b"'.
> 
> > +      *) ;;
> > +    esac
> > +}
> 
> Looks correct.  However, is it also worth attempting a shell-specific
> speedup?  After all, func_append uses shell-specific speedups (it is NOT
> as fast on shells that don't support +=).

Yes.  We could do that (at least in future if not in this thread).

> That is, the above function appears to be portable to all shells, but if
> we detect that a shell supports printf -v %q, can we use that instead
> for some additional speed?  Of course, printf -v %q in bash prefers
> output that does NOT embed inside "" (it quotes ALL shell metacharacters
> using backslash), so we'd have to rework the contract of the function.

Well, I was thinking about 'printf %q' firstly, but I rejected that idea
because I missed the point that bash's printf has '-v' option (thus we can
avoid forking!).  Thanks for this point.

> That is, instead of the current function which returns a ready-escaped
> string but no outer "", we would instead be returning a string that
> already includes all necessary quoting.  Which would mean rewriting all
> callers :(  For example, we'd have to figure out what to do for
> func_quote_for_eval, which returns two values -
> func_quote_for_eval_result being fully quoted is easy with printf -v %q,
> and func_quote_for_eval_unquoted_result which is not.

Yes, rewriting all callers would raise a chance of my mistake.  So I'm not
really in favour of this dangerous action :) ..  it would be just Bash
speedup anyway (even zsh does not have 'printf -v').

Also, we would have to be careful where this optimized function would be
used -- e.g. 'sed_quote_subst' has been historically used to generate
'*.la' files.  We should keep that format as is -- so we would have to
have two versions of func_quote (one possibly with printf %q, one
producing the old output).

> Here's a (lightly-tested) idea of what it would look like, where we'd
> have to audit every caller to deal with the result already including
> full quoting:
> 
> if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then
>   func_quote()
>   {
>     printf -v func_quote_result %q "$1"
>   }
> else
>   func_quote()
>   {
>     portable version, except add:
>     func_quote_result="\"$func_quote_result\""
>   }
> fi
>
> Note that with this variant, the portable version converts 'a  *"b' into
> '"a  *\"b"', while the bash version converts it into 'a\ \ \*\"b'.

Thanks!  I've done a simple testing too .. and it seems to be equivalent
to '$SED $sed_quote_subst' (in usecases like: string -> quote -> eval).

Well -- I'm not 100% sure I want to hack this in and risk some issues.  On
the other hand, I'm able to review the patches if somebody wanted to give
it a try (the small bash slowdown from v2.4.2 to v2.4.6 is not good
motivation for me).

Much more interesting (than fighting this quotation issue) would be an
attempt to standardize some shell builtin which would have the same
effects as 'func_quote_for_eval' (because returning "$@" array unchanged
up to caller function is not an easy task).  If that was available, we
could easily (conditionally) reuse that...  (some shell hackers could
implement binary simulating this builtin..., etc.)...

... while I'm on it -- we could avoid a lot of calls to func_quote and
func_quote_for_eval if we were able to _return arrays_ (or set global
array variables).  Because some shells can do this, maybe we could detect
this feature and return $<HOOK_NAME>_result_array instead of
quoted-for-eval $<HOOK_NAME>_result?

> > @@ -8596,8 +8597,8 @@ EOF
> >  	    relink_command="$var=$func_quote_for_eval_result; export $var; $relink_command"
> >  	  fi
> >  	done
> > -	relink_command="(cd `pwd`; $relink_command)"
> > -	relink_command=`$ECHO "$relink_command" | $SED "$sed_quote_subst"`
> > +	func_quote "(cd `pwd`; $relink_command)"
> > +	relink_command=$func_quote_result
> 
> Unrelated to your patch, but doesn't this fail if pwd is called in an
> absolute directory containing spaces?

Yes, good catch.  This should be fixed.

Pavel





Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Mon, 12 Oct 2015 17:39:02 GMT) Full text and rfc822 format available.

Message #34 received at 20006 <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: Mike Frysinger <vapier <at> gentoo.org>, 20006 <at> debbugs.gnu.org,
 Richard Purdie <richard.purdie <at> linuxfoundation.org>, libtool <at> gnu.org
Subject: Re: Bash-specific performance by avoiding sed
Date: Mon, 12 Oct 2015 19:38:40 +0200
[Message part 1 (text/plain, inline)]
On Friday 09 of October 2015 17:16:03 Pavel Raiskup wrote:
> > Here's a (lightly-tested) idea of what it would look like, where we'd
> > have to audit every caller to deal with the result already including
> > full quoting:
> > 
> > if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then
> >   func_quote()
> >   {
> >     printf -v func_quote_result %q "$1"
> >   }
> > else
> >   func_quote()
> >   {
> >     portable version, except add:
> >     func_quote_result="\"$func_quote_result\""
> >   }
> > fi
> >
> > Note that with this variant, the portable version converts 'a  *"b' into
> > '"a  *\"b"', while the bash version converts it into 'a\ \ \*\"b'.
> 
> Thanks!  I've done a simple testing too .. and it seems to be equivalent
> to '$SED $sed_quote_subst' (in usecases like: string -> quote -> eval).
> 
> Well -- I'm not 100% sure I want to hack this in and risk some issues.  On
> the other hand, I'm able to review the patches if somebody wanted to give
> it a try (the small bash slowdown from v2.4.2 to v2.4.6 is not good
> motivation for me).

Hm, I was thinking thinking about this over the weekend and I wanted to do
some testing before I would definitely reject that idea.  But it sounds
like the printf builtin helps lets say significantly to bash and does not
hurt others.  I changed the function names so the changes to libtool code
should be pretty transparent (even though it is rather large change).

I'm posting results of the performance test (on systemd package) and
attaching 3 "planned to be pushed" patches.  So testing '2.4.2' version
(fast), git-version after applying 0001 and 0002 (patch-2) and including
0003 (patch-3).

The dirname of test is in format 'PACKAGE-SHELL-LTVERSION'.

Pavel
-----

%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.2)
0:28.50 56.84 41.42
0:28.38 56.72 41.41
0:28.26 56.44 41.68
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.6.13-f729)
1:06.13 232.12  23.33
1:06.52 232.28  23.35
1:06.54 232.16  23.52
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.6.14-7a09)
0:46.52 151.16  24.48
0:46.64 150.80  24.60
0:46.69 151.12  23.85


%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.2)
0:16.34 22.22 28.62
0:17.13 22.57 29.09
0:17.11 22.37 29.24
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.6.13-f729)
0:23.38 58.80 21.32
0:23.16 58.81 21.16
0:23.21 58.67 21.15
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.6.14-7a09)
0:21.36 51.20 21.62
0:21.23 50.59 21.76
0:21.25 50.70 21.55


%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.2)
0:19.65 30.76 34.12
0:21.09 30.68 34.63
0:19.75 30.83 34.29
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.6.13-f729)
0:30.75 85.60 25.41
0:30.70 85.51 25.52
0:30.80 85.65 25.41
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.6.14-7a09)
0:30.78 84.25 26.14
0:30.63 83.65 26.17
0:30.62 83.85 26.02


%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.2)
0:27.94 44.47 48.97
0:28.03 44.29 49.31
0:28.60 44.48 48.93
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.6.13-f729)
0:47.60 140.30  41.70
0:47.53 139.84  41.89
0:47.56 139.71  42.12
%E  %U  %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.6.14-7a09)
0:46.21 131.88  41.72
0:46.62 131.31  42.25
0:46.44 131.35  42.03
[0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch (text/x-patch, attachment)]
[0002-libtool-optimizing-options-parser-hooks.patch (text/x-patch, attachment)]
[0003-funclib-refactor-quoting-methods-a-bit.patch (text/x-patch, attachment)]

Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Wed, 04 Nov 2015 07:00:04 GMT) Full text and rfc822 format available.

Message #37 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pavel Raiskup <praiskup <at> redhat.com>
To: bug-libtool <at> gnu.org, 20006 <at> debbugs.gnu.org
Cc: control <at> debbugs.gnu.org
Subject: Re: bug#20006: Get rid of excessive sed forks
Date: Wed, 04 Nov 2015 07:58:21 +0100
close 20006 libtool-2.4.6.19-aabc
thanks

I installed two test-cases worth running a bit before the future releases;
so any pre-release reports would be really appreciated.

$ make check TESTSUITEFLAGS='-k none' TESTS='test-funclib-quote.sh test-option-parser.sh'

Pavel





Information forwarded to bug-libtool <at> gnu.org:
bug#20006; Package libtool. (Wed, 04 Nov 2015 07:00:06 GMT) Full text and rfc822 format available.

bug marked as fixed in version 2.4.6.19-f187, send any further explanations to 20005 <at> debbugs.gnu.org and Harald Hoyer <harald <at> redhat.com> Request was from Pavel Raiskup <praiskup <at> redhat.com> to control <at> debbugs.gnu.org. (Wed, 04 Nov 2015 07:16:03 GMT) Full text and rfc822 format available.

bug marked as fixed in version 2.4.6.19-aabc, send any further explanations to 20005 <at> debbugs.gnu.org and Harald Hoyer <harald <at> redhat.com> Request was from Pavel Raiskup <praiskup <at> redhat.com> to control <at> debbugs.gnu.org. (Wed, 04 Nov 2015 07:18:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 02 Dec 2015 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 154 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.