GNU bug report logs - #20029
'yes' surprisingly slow

Previous Next

Package: coreutils;

Reported by: Ole Tange <tange <at> gnu.org>

Date: Sat, 7 Mar 2015 11:50:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20029 in the body.
You can then email your comments to 20029 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Sat, 07 Mar 2015 11:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ole Tange <tange <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 07 Mar 2015 11:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ole Tange <tange <at> gnu.org>
To: bug-coreutils <at> gnu.org
Subject: 'yes' surprisingly slow
Date: Sat, 7 Mar 2015 12:49:10 +0100
These two commands give the same output:

$ yes `echo {1..1000}` | head -c 2300M | md5sum
a0241f2247e9a37db60e7def3e4f7038  -

$ yes "`echo {1..1000}`" | head -c 2300M | md5sum
a0241f2247e9a37db60e7def3e4f7038  -

But the time to run is quite different:

$ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null

real    0m0.897s
user    0m0.384s
sys     0m1.343s

$ time yes `echo {1..1000}` | head -c 2300M >/dev/null

real    0m11.352s
user    0m10.571s
sys     0m2.590s

WTF?!

I imagine 'yes' spends a lot of time collecting the 1000 args. But why
does it do that more than once?


/Ole




Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Sat, 07 Mar 2015 12:11:01 GMT) Full text and rfc822 format available.

Message #8 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Ole Tange <tange <at> gnu.org>, 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Sat, 07 Mar 2015 12:10:39 +0000
On 07/03/15 11:49, Ole Tange wrote:
> These two commands give the same output:
> 
> $ yes `echo {1..1000}` | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038  -
> 
> $ yes "`echo {1..1000}`" | head -c 2300M | md5sum
> a0241f2247e9a37db60e7def3e4f7038  -
> 
> But the time to run is quite different:
> 
> $ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null
> 
> real    0m0.897s
> user    0m0.384s
> sys     0m1.343s
> 
> $ time yes `echo {1..1000}` | head -c 2300M >/dev/null
> 
> real    0m11.352s
> user    0m10.571s
> sys     0m2.590s
> 
> WTF?!
> 
> I imagine 'yes' spends a lot of time collecting the 1000 args. But why
> does it do that more than once?

The stdio interactions dominate here.
The slow case has 1000 times more fputs_unlocked() calls.
Yes we could build the line up once and output that.
If doing that we could also build up a BUFSIZ of complete lines
to output at a time, in which case you'd probably avoid stdio altogether.

BTW I noticed tee uses stdio calls which is redundant overhead currently.
It wouldn't if we added a --buffered call to tee so that it might
honor stdbuf(1), though I'm not sure it's worth that flexibility in tee.

I'll look at improving these.

thanks,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Mon, 09 Mar 2015 19:49:02 GMT) Full text and rfc822 format available.

Message #11 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Ole Tange <tange <at> gnu.org>, 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Mon, 09 Mar 2015 19:47:43 +0000
[Message part 1 (text/plain, inline)]
On 07/03/15 12:10, Pádraig Brady wrote:
> On 07/03/15 11:49, Ole Tange wrote:
>> These two commands give the same output:
>>
>> $ yes `echo {1..1000}` | head -c 2300M | md5sum
>> a0241f2247e9a37db60e7def3e4f7038  -
>>
>> $ yes "`echo {1..1000}`" | head -c 2300M | md5sum
>> a0241f2247e9a37db60e7def3e4f7038  -
>>
>> But the time to run is quite different:
>>
>> $ time yes "`echo {1..1000}`" | head -c 2300M >/dev/null
>>
>> real    0m0.897s
>> user    0m0.384s
>> sys     0m1.343s
>>
>> $ time yes `echo {1..1000}` | head -c 2300M >/dev/null
>>
>> real    0m11.352s
>> user    0m10.571s
>> sys     0m2.590s
>>
>> WTF?!
>>
>> I imagine 'yes' spends a lot of time collecting the 1000 args. But why
>> does it do that more than once?
> 
> The stdio interactions dominate here.
> The slow case has 1000 times more fputs_unlocked() calls.
> Yes we could build the line up once and output that.
> If doing that we could also build up a BUFSIZ of complete lines
> to output at a time, in which case you'd probably avoid stdio altogether.

The attached should make things more efficient here.

thanks,
Pádraig.

[yes-efficiency.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Mon, 09 Mar 2015 20:03:02 GMT) Full text and rfc822 format available.

Message #14 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Ole Tange <tange <at> gnu.org>, 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Mon, 09 Mar 2015 14:02:16 -0600
[Message part 1 (text/plain, inline)]
On 03/09/2015 01:47 PM, Pádraig Brady wrote:

> 
> Note this change also ensures that yes will only write complete lines
> for lines softer than BUFSIZ.

s/softer/smaller/

> 
> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
> and output that, rather than having stdio process each item.
> * tests/misc/yes.sh: Add a new test for various buffer sizes.
> * tests/local.mk: Reference the new test.
> Fixes http://bugs.gnu.org/20029
> ---
>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>  tests/local.mk    |  1 +
>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>  3 files changed, 70 insertions(+), 2 deletions(-)
>  create mode 100755 tests/misc/yes.sh
> 
> diff --git a/src/yes.c b/src/yes.c
> index b35b13f..91dea11 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 'y'.\n\
>  int
>  main (int argc, char **argv)
>  {
> +  char buf[BUFSIZ];

Do you really want this stack-allocated?  BUFSIZ can be larger than a
page, which can then interfere with stack overflow detection.

> +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
> +print_ver_ yes
> +
> +for size in 1 4095 4096 8191 8192 16383 16384; do

Should you also test 4097 8193 and 16385 (that is, a likely
1-more-than-BUFSIZ in the mix)?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Mon, 09 Mar 2015 22:16:02 GMT) Full text and rfc822 format available.

Message #17 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>, Ole Tange <tange <at> gnu.org>, 
 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Mon, 09 Mar 2015 22:14:57 +0000
On 09/03/15 20:02, Eric Blake wrote:
> On 03/09/2015 01:47 PM, Pádraig Brady wrote:
> 
>>
>> Note this change also ensures that yes will only write complete lines
>> for lines softer than BUFSIZ.
> 
> s/softer/smaller/
> 
>>
>> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
>> and output that, rather than having stdio process each item.
>> * tests/misc/yes.sh: Add a new test for various buffer sizes.
>> * tests/local.mk: Reference the new test.
>> Fixes http://bugs.gnu.org/20029
>> ---
>>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>>  tests/local.mk    |  1 +
>>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>>  3 files changed, 70 insertions(+), 2 deletions(-)
>>  create mode 100755 tests/misc/yes.sh
>>
>> diff --git a/src/yes.c b/src/yes.c
>> index b35b13f..91dea11 100644
>> --- a/src/yes.c
>> +++ b/src/yes.c
>> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 'y'.\n\
>>  int
>>  main (int argc, char **argv)
>>  {
>> +  char buf[BUFSIZ];
> 
> Do you really want this stack-allocated?  BUFSIZ can be larger than a
> page, which can then interfere with stack overflow detection.

Well we do such stack buffers elsewhere:

$ git grep char.*\\[BUFSIZ\\]
src/head.c:  char buf[BUFSIZ];
src/head.c:    char buffer[BUFSIZ];
src/head.c:  char buffer[BUFSIZ];
src/head.c:  char buffer[BUFSIZ];
src/head.c:  char buffer[BUFSIZ];
src/ls.c:  char smallbuf[BUFSIZ];
src/od.c:              char buf[BUFSIZ];
src/tail.c:      char buffer[BUFSIZ];
src/tail.c:  char buffer[BUFSIZ];
src/tail.c:    char buffer[BUFSIZ];
src/tail.c:    char buffer[BUFSIZ];
src/tail.c:  char buffer[BUFSIZ];
src/tail.c:      char buffer[BUFSIZ];
src/tee.c:  char buffer[BUFSIZ];
src/tr.c:static char io_buf[BUFSIZ];
src/uptime.c:      char buf[BUFSIZ];
src/yes.c:  char buf[BUFSIZ];

We would probably change them all if this was thought to be a problem.

>> +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
>> +print_ver_ yes
>> +
>> +for size in 1 4095 4096 8191 8192 16383 16384; do
> 
> Should you also test 4097 8193 and 16385 (that is, a likely
> 1-more-than-BUFSIZ in the mix)?

The 1 more is implicit with the \n added by yes(1).

thanks for the review!
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Mon, 09 Mar 2015 22:47:02 GMT) Full text and rfc822 format available.

Message #20 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>, 
 Eric Blake <eblake <at> redhat.com>,
 Ole Tange <tange <at> gnu.org>, 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Mon, 09 Mar 2015 15:46:24 -0700
Pádraig Brady wrote:
> We would probably change them all if this was thought to be a problem.

I don't think it's a real problem.  I've never run into a system where a 
nonrecursive function can't declare a local variable of size BUFSIZ, and given 
the long tradition of software doing just that I would say that such a system 
has such a low quality of implementation that we needn't worry about it.




Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Mon, 09 Mar 2015 23:10:02 GMT) Full text and rfc822 format available.

Message #23 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Pádraig Brady <P <at> draigbrady.com>,
 Ole Tange <tange <at> gnu.org>, 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Mon, 09 Mar 2015 17:09:29 -0600
[Message part 1 (text/plain, inline)]
On 03/09/2015 04:14 PM, Pádraig Brady wrote:

>>> +
>>> +for size in 1 4095 4096 8191 8192 16383 16384; do
>>
>> Should you also test 4097 8193 and 16385 (that is, a likely
>> 1-more-than-BUFSIZ in the mix)?
> 
> The 1 more is implicit with the \n added by yes(1).

Then you should also test 4094 8190 and 16382, to make sure that a
multiple of a non-aligned subset with a small unused tail also works
(although I guess the 4097 tests the case of an unused tail if BUFSIZ is
8192, although in that case the unused tail is much larger than a few
bytes).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Tue, 10 Mar 2015 00:32:02 GMT) Full text and rfc822 format available.

Message #26 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Giuseppe Scrivano <gscrivano <at> gnu.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 20029 <at> debbugs.gnu.org, Ole Tange <tange <at> gnu.org>
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 01:31:36 +0100
Pádraig Brady <P <at> draigBrady.com> writes:

> The attached should make things more efficient here.
>
> thanks,
> Pádraig.
>
>
> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P <at> draigBrady.com>
> Date: Mon, 9 Mar 2015 19:27:32 +0000
> Subject: [PATCH] yes: output data more efficiently
>
> yes(1) may be used to generate repeating patterns of text
> for test inputs etc., so adjust to be more efficient.
>
> Profiling the case where yes(1) is outputting small items
> through stdio (which was the default case), shows the overhead
> continuously processing small items in main() and in stdio:
>
>     $ yes >/dev/null & perf top -p $!
>     31.02%  yes           [.] main
>     27.36%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>     14.51%  libc-2.20.so  [.] fputs_unlocked
>     13.50%  libc-2.20.so  [.] strlen
>     10.66%  libc-2.20.so  [.] __GI___mempcpy
>      1.98%  yes           [.] fputs_unlocked <at> plta
>
> Sending more data per stdio call improves the situation,
> but still, there is significant stdio overhead due to memory copies,
> and the repeated string length checking:
>
>     $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
>     42.26%  libc-2.20.so  [.] __GI___mempcpy
>     17.38%  libc-2.20.so  [.] strlen
>      5.21%  [kernel]      [k] __srcu_read_lock
>      4.58%  [kernel]      [k] __srcu_read_unlock
>      4.27%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>      2.50%  libc-2.20.so  [.] __GI___libc_write
>      2.45%  [kernel]      [k] system_call
>      2.40%  [kernel]      [k] system_call_after_swapgs
>      2.27%  [kernel]      [k] vfs_write
>      2.09%  libc-2.20.so  [.] _IO_do_write@@GLIBC_2.2.5
>      2.01%  [kernel]      [k] fsnotify
>      1.95%  libc-2.20.so  [.] _IO_file_write@@GLIBC_2.2.5
>      1.44%  yes           [.] main
>
> We can avoid all stdio overhead by building up the buffer
> _once_ and outputting that, and the profile below shows
> the bottleneck moved to the kernel:
>
>     $ src/yes >/dev/null & perf top -p $!
>     15.42%  [kernel]      [k] __srcu_read_lock
>     12.98%  [kernel]      [k] __srcu_read_unlock
>      9.41%  libc-2.20.so  [.] __GI___libc_write
>      9.11%  [kernel]      [k] vfs_write
>      8.35%  [kernel]      [k] fsnotify
>      8.02%  [kernel]      [k] system_call
>      5.84%  [kernel]      [k] system_call_after_swapgs
>      4.54%  [kernel]      [k] __fget_light
>      3.98%  [kernel]      [k] sys_write
>      3.65%  [kernel]      [k] selinux_file_permission
>      3.44%  [kernel]      [k] rw_verify_area
>      2.94%  [kernel]      [k] __fsnotify_parent
>      2.76%  [kernel]      [k] security_file_permission
>      2.39%  yes           [.] main
>      2.17%  [kernel]      [k] __fdget_pos
>      2.13%  [kernel]      [k] sysret_check
>      0.81%  [kernel]      [k] write_null
>      0.36%  yes           [.] write <at> plt
>
> Note this change also ensures that yes will only write complete lines
> for lines softer than BUFSIZ.
>
> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
> and output that, rather than having stdio process each item.
> * tests/misc/yes.sh: Add a new test for various buffer sizes.
> * tests/local.mk: Reference the new test.
> Fixes http://bugs.gnu.org/20029
> ---
>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>  tests/local.mk    |  1 +
>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>  3 files changed, 70 insertions(+), 2 deletions(-)
>  create mode 100755 tests/misc/yes.sh
>
> diff --git a/src/yes.c b/src/yes.c
> index b35b13f..91dea11 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 'y'.\n\
>  int
>  main (int argc, char **argv)
>  {
> +  char buf[BUFSIZ];
> +  char *pbuf = buf;
> +  int i;
> +
>    initialize_main (&argc, &argv);
>    set_program_name (argv[0]);
>    setlocale (LC_ALL, "");
> @@ -77,9 +81,44 @@ main (int argc, char **argv)
>        argv[argc++] = bad_cast ("y");
>      }
>  
> -  while (true)
> +  /* Buffer data locally once, rather than having the
> +     large overhead of stdio buffering each item.   */
> +  for (i = optind; i < argc; i++)
> +    {
> +      size_t len = strlen (argv[i]);
> +      if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
> +        break;
> +      memcpy (pbuf, argv[i], len);
> +      pbuf += len;
> +      *pbuf++ = i == argc - 1 ? '\n' : ' ';
> +    }
> +  if (i < argc)
> +    pbuf = NULL;

since the buffer is partly filled, wouldn't be better to reuse it?

Something like this (barely tested):

diff --git a/src/yes.c b/src/yes.c
index 91dea11..ac690ce 100644
--- a/src/yes.c
+++ b/src/yes.c
@@ -92,9 +92,7 @@ main (int argc, char **argv)
       pbuf += len;
       *pbuf++ = i == argc - 1 ? '\n' : ' ';
     }
-  if (i < argc)
-    pbuf = NULL;
-  else
+  if (i == argc)
     {
       size_t line_len = pbuf - buf;
       size_t lines = BUFSIZ / line_len;
@@ -106,7 +104,7 @@ main (int argc, char **argv)
     }
 
   /* The normal case is to continuously output the local buffer.  */
-  while (pbuf)
+  while (i == argc)
     {
       if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
         {
@@ -117,11 +115,17 @@ main (int argc, char **argv)
 
   /* If the data doesn't fit in BUFSIZ then it's large
      and not too onerous to output using stdio in any case.  */
-  while (! pbuf)
+  while (i != argc)
     {
-      for (i = optind; i < argc; i++)
-        if (fputs (argv[i], stdout) == EOF
-            || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
+      int j;
+      if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
+        {
+          error (0, errno, _("standard output"));
+          return EXIT_FAILURE;
+        }
+      for (j = i; j < argc; j++)
+        if (fputs (argv[j], stdout) == EOF
+            || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
           {
             error (0, errno, _("standard output"));
             return EXIT_FAILURE;


The difference seems to be (without the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real	0m7.994s
user	0m11.093s
sys	0m2.953s

versus (with the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real	0m3.534s
user	0m4.164s
sys	0m1.803s

Regards,
Giuseppe





Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Tue, 10 Mar 2015 00:51:02 GMT) Full text and rfc822 format available.

Notification sent to Ole Tange <tange <at> gnu.org>:
bug acknowledged by developer. (Tue, 10 Mar 2015 00:51:02 GMT) Full text and rfc822 format available.

Message #31 received at 20029-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Giuseppe Scrivano <gscrivano <at> gnu.org>
Cc: 20029-done <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 00:50:41 +0000
On 10/03/15 00:31, Giuseppe Scrivano wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
> 
>> The attached should make things more efficient here.
>>
>> thanks,
>> Pádraig.
>>
>>
>> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P <at> draigBrady.com>
>> Date: Mon, 9 Mar 2015 19:27:32 +0000
>> Subject: [PATCH] yes: output data more efficiently
>>
>> yes(1) may be used to generate repeating patterns of text
>> for test inputs etc., so adjust to be more efficient.
>>
>> Profiling the case where yes(1) is outputting small items
>> through stdio (which was the default case), shows the overhead
>> continuously processing small items in main() and in stdio:
>>
>>     $ yes >/dev/null & perf top -p $!
>>     31.02%  yes           [.] main
>>     27.36%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>>     14.51%  libc-2.20.so  [.] fputs_unlocked
>>     13.50%  libc-2.20.so  [.] strlen
>>     10.66%  libc-2.20.so  [.] __GI___mempcpy
>>      1.98%  yes           [.] fputs_unlocked <at> plta
>>
>> Sending more data per stdio call improves the situation,
>> but still, there is significant stdio overhead due to memory copies,
>> and the repeated string length checking:
>>
>>     $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
>>     42.26%  libc-2.20.so  [.] __GI___mempcpy
>>     17.38%  libc-2.20.so  [.] strlen
>>      5.21%  [kernel]      [k] __srcu_read_lock
>>      4.58%  [kernel]      [k] __srcu_read_unlock
>>      4.27%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>>      2.50%  libc-2.20.so  [.] __GI___libc_write
>>      2.45%  [kernel]      [k] system_call
>>      2.40%  [kernel]      [k] system_call_after_swapgs
>>      2.27%  [kernel]      [k] vfs_write
>>      2.09%  libc-2.20.so  [.] _IO_do_write@@GLIBC_2.2.5
>>      2.01%  [kernel]      [k] fsnotify
>>      1.95%  libc-2.20.so  [.] _IO_file_write@@GLIBC_2.2.5
>>      1.44%  yes           [.] main
>>
>> We can avoid all stdio overhead by building up the buffer
>> _once_ and outputting that, and the profile below shows
>> the bottleneck moved to the kernel:
>>
>>     $ src/yes >/dev/null & perf top -p $!
>>     15.42%  [kernel]      [k] __srcu_read_lock
>>     12.98%  [kernel]      [k] __srcu_read_unlock
>>      9.41%  libc-2.20.so  [.] __GI___libc_write
>>      9.11%  [kernel]      [k] vfs_write
>>      8.35%  [kernel]      [k] fsnotify
>>      8.02%  [kernel]      [k] system_call
>>      5.84%  [kernel]      [k] system_call_after_swapgs
>>      4.54%  [kernel]      [k] __fget_light
>>      3.98%  [kernel]      [k] sys_write
>>      3.65%  [kernel]      [k] selinux_file_permission
>>      3.44%  [kernel]      [k] rw_verify_area
>>      2.94%  [kernel]      [k] __fsnotify_parent
>>      2.76%  [kernel]      [k] security_file_permission
>>      2.39%  yes           [.] main
>>      2.17%  [kernel]      [k] __fdget_pos
>>      2.13%  [kernel]      [k] sysret_check
>>      0.81%  [kernel]      [k] write_null
>>      0.36%  yes           [.] write <at> plt
>>
>> Note this change also ensures that yes will only write complete lines
>> for lines softer than BUFSIZ.
>>
>> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
>> and output that, rather than having stdio process each item.
>> * tests/misc/yes.sh: Add a new test for various buffer sizes.
>> * tests/local.mk: Reference the new test.
>> Fixes http://bugs.gnu.org/20029
>> ---
>>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>>  tests/local.mk    |  1 +
>>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>>  3 files changed, 70 insertions(+), 2 deletions(-)
>>  create mode 100755 tests/misc/yes.sh
>>
>> diff --git a/src/yes.c b/src/yes.c
>> index b35b13f..91dea11 100644
>> --- a/src/yes.c
>> +++ b/src/yes.c
>> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 'y'.\n\
>>  int
>>  main (int argc, char **argv)
>>  {
>> +  char buf[BUFSIZ];
>> +  char *pbuf = buf;
>> +  int i;
>> +
>>    initialize_main (&argc, &argv);
>>    set_program_name (argv[0]);
>>    setlocale (LC_ALL, "");
>> @@ -77,9 +81,44 @@ main (int argc, char **argv)
>>        argv[argc++] = bad_cast ("y");
>>      }
>>  
>> -  while (true)
>> +  /* Buffer data locally once, rather than having the
>> +     large overhead of stdio buffering each item.   */
>> +  for (i = optind; i < argc; i++)
>> +    {
>> +      size_t len = strlen (argv[i]);
>> +      if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
>> +        break;
>> +      memcpy (pbuf, argv[i], len);
>> +      pbuf += len;
>> +      *pbuf++ = i == argc - 1 ? '\n' : ' ';
>> +    }
>> +  if (i < argc)
>> +    pbuf = NULL;
> 
> since the buffer is partly filled, wouldn't be better to reuse it?
> 
> Something like this (barely tested):
> 
> diff --git a/src/yes.c b/src/yes.c
> index 91dea11..ac690ce 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -92,9 +92,7 @@ main (int argc, char **argv)
>        pbuf += len;
>        *pbuf++ = i == argc - 1 ? '\n' : ' ';
>      }
> -  if (i < argc)
> -    pbuf = NULL;
> -  else
> +  if (i == argc)
>      {
>        size_t line_len = pbuf - buf;
>        size_t lines = BUFSIZ / line_len;
> @@ -106,7 +104,7 @@ main (int argc, char **argv)
>      }
>  
>    /* The normal case is to continuously output the local buffer.  */
> -  while (pbuf)
> +  while (i == argc)
>      {
>        if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
>          {
> @@ -117,11 +115,17 @@ main (int argc, char **argv)
>  
>    /* If the data doesn't fit in BUFSIZ then it's large
>       and not too onerous to output using stdio in any case.  */
> -  while (! pbuf)
> +  while (i != argc)
>      {
> -      for (i = optind; i < argc; i++)
> -        if (fputs (argv[i], stdout) == EOF
> -            || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
> +      int j;
> +      if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
> +        {
> +          error (0, errno, _("standard output"));
> +          return EXIT_FAILURE;
> +        }
> +      for (j = i; j < argc; j++)
> +        if (fputs (argv[j], stdout) == EOF
> +            || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
>            {
>              error (0, errno, _("standard output"));
>              return EXIT_FAILURE;
> 
> 
> The difference seems to be (without the patch):
> 
> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
> 
> real	0m7.994s
> user	0m11.093s
> sys	0m2.953s
> 
> versus (with the patch):
> 
> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
> 
> real	0m3.534s
> user	0m4.164s
> sys	0m1.803s

A fair point for the many small arguments case,
though doesn't help the few large arguments case.
Given this is only slightly more complicated,
we might as well do this also.

thanks,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Tue, 10 Mar 2015 21:19:01 GMT) Full text and rfc822 format available.

Message #34 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Ole Tange <tange <at> gnu.org>
To: Giuseppe Scrivano <gscrivano <at> gnu.org>
Cc: 20029 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigbrady.com>
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 22:18:23 +0100
On Tue, Mar 10, 2015 at 1:31 AM, Giuseppe Scrivano <gscrivano <at> gnu.org> wrote:

> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
>
> real    0m7.994s
> user    0m11.093s
> sys     0m2.953s
>
> versus (with the patch):
>
> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
> 55c293324aa6ecce14f0bf30da5a4686  -
>
> real    0m3.534s
> user    0m4.164s
> sys     0m1.803s

Are you sure your are not limited by md5sum?

$ time yes "`echo {1..2000}`" | head -c 2000M >/dev/null

real    0m0.660s
user    0m0.180s
sys     0m1.115s

The solution should perform no worse than that.


/Ole




Information forwarded to bug-coreutils <at> gnu.org:
bug#20029; Package coreutils. (Wed, 11 Mar 2015 00:00:06 GMT) Full text and rfc822 format available.

Message #37 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Ole Tange <tange <at> gnu.org>, Giuseppe Scrivano <gscrivano <at> gnu.org>
Cc: 20029 <at> debbugs.gnu.org
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 23:59:16 +0000
On 10/03/15 21:18, Ole Tange wrote:
> On Tue, Mar 10, 2015 at 1:31 AM, Giuseppe Scrivano <gscrivano <at> gnu.org> wrote:
> 
>> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
>> 55c293324aa6ecce14f0bf30da5a4686  -
>>
>> real    0m7.994s
>> user    0m11.093s
>> sys     0m2.953s
>>
>> versus (with the patch):
>>
>> $ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
>> 55c293324aa6ecce14f0bf30da5a4686  -
>>
>> real    0m3.534s
>> user    0m4.164s
>> sys     0m1.803s
> 
> Are you sure your are not limited by md5sum?
> 
> $ time yes "`echo {1..2000}`" | head -c 2000M >/dev/null
> 
> real    0m0.660s
> user    0m0.180s
> sys     0m1.115s
> 
> The solution should perform no worse than that.

Two separate cases. Yours above is a single large argument.
Giuseppe's example above, and your original problematic example
were without quotes, and so with many arguments passed to yes.
I've applied Giuseppe's patch with an augmented test.

cheers,
Pádraig.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 08 Apr 2015 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.