GNU bug report logs - #6906
[PATCH] cp: copy entirely-sparse files oodles faster

Previous Next

Package: coreutils;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Wed, 25 Aug 2010 05:37:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6906 in the body.
You can then email your comments to 6906 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6906; Package coreutils. (Wed, 25 Aug 2010 05:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 25 Aug 2010 05:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: bug-coreutils <at> gnu.org
Subject: [PATCH] cp: copy entirely-sparse files oodles faster
Date: Tue, 24 Aug 2010 22:37:02 -0700
(By "oodles faster" I mean "as much faster as you like".
The benchmark below shows a 2800x speedup.)

In response to an idea by Kit Westneat for GNU tar reported in
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>,
Eric Blake wrote:

> Meanwhile, if you are indeed correct that there are easy ways to detect
> completely sparse files, even when the ioctl or SEEK_HOLE directives are
> not present, then the coreutils cp(1) hole iteration routine should
> probably be taught that corner case to recognize an entirely sparse file
> as a single hole.

Here's a patch to coreutils to implement this idea.  It's based on a patch
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html> that
I just now installed into GNU tar.  I think of it as a quick first cut
at full fiemap / SEEK_HOLE implementation, but unlike the full
implementation this optimization does not depend on any special ioctls
or lseek extensions, so it should work on any POSIX or POSIX-like host.

On a simple benchmark this sped up GNU cp by a factor of 2800
(measuring by real-time seconds) on my host:

   $ truncate -s 10GB bigfile
   $ time old/cp bigfile bigfile-slow

   real    2m3.231s
   user    0m1.497s
   sys     0m5.738s
   $ time new/cp bigfile bigfile-fast

   real    0m0.044s
   user    0m0.000s
   sys     0m0.002s
   $ ls -ls bigfile*
   0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:11 bigfile
   0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-fast
   0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-slow

From 2e535b590d675e6d96f954c1f840d678fb133f6a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Tue, 24 Aug 2010 22:20:55 -0700
Subject: [PATCH] cp: copy entirely-sparse files oodles faster

* src/copy.c (copy_reg): Bypass reads if the file is entirely
sparse.  Idea suggested for by Kit Westneat via Bernd Shubert in
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>
for the Lustre file system.  Implementation stolen from my patch
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html>
to GNU tar.  On my machine this sped up a cp benchmark, which
copied a 10 GB entirely-sparse file on an NFS file system, by a
factor of 2800 in real seconds.
---
 src/copy.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 6d11ed8..1e79523 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -669,10 +669,21 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }
 
-      /* If not making a sparse file, try to use a more-efficient
-         buffer size.  */
-      if (! make_holes)
+      if (make_holes)
         {
+          /* For speed, bypass reads if the file is entirely sparse.  */
+
+          if (src_open_sb.st_size != 0 && ST_NBLOCKS (src_open_sb) == 0)
+            {
+              n_read_total = src_open_sb.st_size;
+              goto set_dest_size;
+            }
+        }
+      else
+        {
+          /* Not making a sparse file, so try to use a more-efficient
+             buffer size.  */
+
           /* Compute the least common multiple of the input and output
              buffer sizes, adjusting for outlandish values.  */
           size_t blcm_max = MIN (SIZE_MAX, SSIZE_MAX) - buf_alignment_slop;
@@ -788,6 +799,7 @@ copy_reg (char const *src_name, char const *dst_name,
 
       if (last_write_made_hole)
         {
+        set_dest_size:
           if (ftruncate (dest_desc, n_read_total) < 0)
             {
               error (0, errno, _("truncating %s"), quote (dst_name));
-- 
1.7.2






Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6906; Package coreutils. (Sun, 17 Apr 2011 08:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 6906 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 6906 <at> debbugs.gnu.org
Subject: Re: bug#6906: [PATCH] cp: copy entirely-sparse files oodles faster
Date: Sun, 17 Apr 2011 10:55:52 +0200
Paul Eggert wrote:
> (By "oodles faster" I mean "as much faster as you like".
> The benchmark below shows a 2800x speedup.)
>
> In response to an idea by Kit Westneat for GNU tar reported in
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>,
> Eric Blake wrote:
>
>> Meanwhile, if you are indeed correct that there are easy ways to detect
>> completely sparse files, even when the ioctl or SEEK_HOLE directives are
>> not present, then the coreutils cp(1) hole iteration routine should
>> probably be taught that corner case to recognize an entirely sparse file
>> as a single hole.
>
> Here's a patch to coreutils to implement this idea.  It's based on a patch
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html> that
> I just now installed into GNU tar.  I think of it as a quick first cut
> at full fiemap / SEEK_HOLE implementation, but unlike the full
> implementation this optimization does not depend on any special ioctls
> or lseek extensions, so it should work on any POSIX or POSIX-like host.
>
> On a simple benchmark this sped up GNU cp by a factor of 2800
> (measuring by real-time seconds) on my host:
>
>    $ truncate -s 10GB bigfile
>    $ time old/cp bigfile bigfile-slow
>
>    real    2m3.231s
>    user    0m1.497s
>    sys     0m5.738s
>    $ time new/cp bigfile bigfile-fast
>
>    real    0m0.044s
>    user    0m0.000s
>    sys     0m0.002s
>    $ ls -ls bigfile*
>    0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:11 bigfile
>    0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-fast
>    0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-slow
>
>>From 2e535b590d675e6d96f954c1f840d678fb133f6a Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 24 Aug 2010 22:20:55 -0700
> Subject: [PATCH] cp: copy entirely-sparse files oodles faster
>
> * src/copy.c (copy_reg): Bypass reads if the file is entirely
> sparse.  Idea suggested for by Kit Westneat via Bernd Shubert in
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>
> for the Lustre file system.  Implementation stolen from my patch
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html>
> to GNU tar.  On my machine this sped up a cp benchmark, which
> copied a 10 GB entirely-sparse file on an NFS file system, by a
> factor of 2800 in real seconds.

Hi Paul,

Somehow I didn't see this patch from you until now, while looking
through the hundreds of outstanding (bug mostly resolved) bugs at
http://debbugs.gnu.org/coreutils.  Sorry about that.

Now that we have FIEMAP support, (by the looks of things
we will soon have SEEK_HOLE support in cp and in the linux kernel)
do you think adding support for this special case is worthwhile?
I could go either way.

If so, would you care to rebase it for 8.13?
coreutils-8.12 will probably be coming soon to adjust FIEMAP
support not to collide with the combination of XFS, 2.6.39
release-candidate kernels and so called "unwritten extents".




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6906; Package coreutils. (Sun, 17 Apr 2011 16:30:04 GMT) Full text and rfc822 format available.

Message #11 received at 6906 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 6906 <at> debbugs.gnu.org
Subject: Re: bug#6906: [PATCH] cp: copy entirely-sparse files oodles faster
Date: Sun, 17 Apr 2011 09:28:56 -0700
On 04/17/11 01:55, Jim Meyering wrote:
> Now that we have FIEMAP support, (by the looks of things
> we will soon have SEEK_HOLE support in cp and in the linux kernel)
> do you think adding support for this special case is worthwhile?
> I could go either way.
> 
> If so, would you care to rebase it for 8.13?

Yes, I expect it's worthwhile, as the FIEMAP stuff isn't universal.
I'll add it to my list of thing to do.  It's not high priority,
to be sure.




Information forwarded to bug-coreutils <at> gnu.org:
bug#6906; Package coreutils. (Wed, 10 Oct 2018 16:37:02 GMT) Full text and rfc822 format available.

Message #14 received at 6906 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Jim Meyering <jim <at> meyering.net>
Cc: 6906 <at> debbugs.gnu.org
Subject: Re: bug#6906: [PATCH] cp: copy entirely-sparse files oodles faster
Date: Wed, 10 Oct 2018 10:36:25 -0600
(triaging old bugs)

Hello,

On 17/04/11 10:28 AM, Paul Eggert wrote:
> On 04/17/11 01:55, Jim Meyering wrote:
>> Now that we have FIEMAP support, (by the looks of things
>> we will soon have SEEK_HOLE support in cp and in the linux kernel)
>> do you think adding support for this special case is worthwhile?
>> I could go either way.
>>
>> If so, would you care to rebase it for 8.13?
> 
> Yes, I expect it's worthwhile, as the FIEMAP stuff isn't universal.
> I'll add it to my list of thing to do.  It's not high priority,
> to be sure.

In the 8 years since the original thread,
cp(1) can now copy sparse files very fast (though I suspect it's still 
with FIEMAP and not SEEK_DATA/HOLE).

https://bugs.gnu.org/6906

Can this be closed as out-dated?

regards,
 - assaf







Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 11 Oct 2018 02:15:02 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Thu, 11 Oct 2018 02:15:04 GMT) Full text and rfc822 format available.

Message #19 received at 6906-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Assaf Gordon <assafgordon <at> gmail.com>, Jim Meyering <jim <at> meyering.net>
Cc: 6906-done <at> debbugs.gnu.org
Subject: Re: bug#6906: [PATCH] cp: copy entirely-sparse files oodles faster
Date: Wed, 10 Oct 2018 19:14:19 -0700
Assaf Gordon wrote:
> Can this be closed as out-dated?

Yes, that's fine. Closing.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 08 Nov 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.