GNU bug report logs -
#6906
[PATCH] cp: copy entirely-sparse files oodles faster
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Wed, 25 Aug 2010 05:37:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6906 in the body.
You can then email your comments to 6906 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6906
; Package
coreutils
.
(Wed, 25 Aug 2010 05:37:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 25 Aug 2010 05:37:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
(By "oodles faster" I mean "as much faster as you like".
The benchmark below shows a 2800x speedup.)
In response to an idea by Kit Westneat for GNU tar reported in
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>,
Eric Blake wrote:
> Meanwhile, if you are indeed correct that there are easy ways to detect
> completely sparse files, even when the ioctl or SEEK_HOLE directives are
> not present, then the coreutils cp(1) hole iteration routine should
> probably be taught that corner case to recognize an entirely sparse file
> as a single hole.
Here's a patch to coreutils to implement this idea. It's based on a patch
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html> that
I just now installed into GNU tar. I think of it as a quick first cut
at full fiemap / SEEK_HOLE implementation, but unlike the full
implementation this optimization does not depend on any special ioctls
or lseek extensions, so it should work on any POSIX or POSIX-like host.
On a simple benchmark this sped up GNU cp by a factor of 2800
(measuring by real-time seconds) on my host:
$ truncate -s 10GB bigfile
$ time old/cp bigfile bigfile-slow
real 2m3.231s
user 0m1.497s
sys 0m5.738s
$ time new/cp bigfile bigfile-fast
real 0m0.044s
user 0m0.000s
sys 0m0.002s
$ ls -ls bigfile*
0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:11 bigfile
0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-fast
0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-slow
From 2e535b590d675e6d96f954c1f840d678fb133f6a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Tue, 24 Aug 2010 22:20:55 -0700
Subject: [PATCH] cp: copy entirely-sparse files oodles faster
* src/copy.c (copy_reg): Bypass reads if the file is entirely
sparse. Idea suggested for by Kit Westneat via Bernd Shubert in
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>
for the Lustre file system. Implementation stolen from my patch
<http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html>
to GNU tar. On my machine this sped up a cp benchmark, which
copied a 10 GB entirely-sparse file on an NFS file system, by a
factor of 2800 in real seconds.
---
src/copy.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/src/copy.c b/src/copy.c
index 6d11ed8..1e79523 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -669,10 +669,21 @@ copy_reg (char const *src_name, char const *dst_name,
#endif
}
- /* If not making a sparse file, try to use a more-efficient
- buffer size. */
- if (! make_holes)
+ if (make_holes)
{
+ /* For speed, bypass reads if the file is entirely sparse. */
+
+ if (src_open_sb.st_size != 0 && ST_NBLOCKS (src_open_sb) == 0)
+ {
+ n_read_total = src_open_sb.st_size;
+ goto set_dest_size;
+ }
+ }
+ else
+ {
+ /* Not making a sparse file, so try to use a more-efficient
+ buffer size. */
+
/* Compute the least common multiple of the input and output
buffer sizes, adjusting for outlandish values. */
size_t blcm_max = MIN (SIZE_MAX, SSIZE_MAX) - buf_alignment_slop;
@@ -788,6 +799,7 @@ copy_reg (char const *src_name, char const *dst_name,
if (last_write_made_hole)
{
+ set_dest_size:
if (ftruncate (dest_desc, n_read_total) < 0)
{
error (0, errno, _("truncating %s"), quote (dst_name));
--
1.7.2
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6906
; Package
coreutils
.
(Sun, 17 Apr 2011 08:56:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 6906 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> (By "oodles faster" I mean "as much faster as you like".
> The benchmark below shows a 2800x speedup.)
>
> In response to an idea by Kit Westneat for GNU tar reported in
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>,
> Eric Blake wrote:
>
>> Meanwhile, if you are indeed correct that there are easy ways to detect
>> completely sparse files, even when the ioctl or SEEK_HOLE directives are
>> not present, then the coreutils cp(1) hole iteration routine should
>> probably be taught that corner case to recognize an entirely sparse file
>> as a single hole.
>
> Here's a patch to coreutils to implement this idea. It's based on a patch
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html> that
> I just now installed into GNU tar. I think of it as a quick first cut
> at full fiemap / SEEK_HOLE implementation, but unlike the full
> implementation this optimization does not depend on any special ioctls
> or lseek extensions, so it should work on any POSIX or POSIX-like host.
>
> On a simple benchmark this sped up GNU cp by a factor of 2800
> (measuring by real-time seconds) on my host:
>
> $ truncate -s 10GB bigfile
> $ time old/cp bigfile bigfile-slow
>
> real 2m3.231s
> user 0m1.497s
> sys 0m5.738s
> $ time new/cp bigfile bigfile-fast
>
> real 0m0.044s
> user 0m0.000s
> sys 0m0.002s
> $ ls -ls bigfile*
> 0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:11 bigfile
> 0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-fast
> 0 -rw-r--r-- 1 eggert csfac 10000000000 Aug 24 22:14 bigfile-slow
>
>>From 2e535b590d675e6d96f954c1f840d678fb133f6a Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 24 Aug 2010 22:20:55 -0700
> Subject: [PATCH] cp: copy entirely-sparse files oodles faster
>
> * src/copy.c (copy_reg): Bypass reads if the file is entirely
> sparse. Idea suggested for by Kit Westneat via Bernd Shubert in
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00038.html>
> for the Lustre file system. Implementation stolen from my patch
> <http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html>
> to GNU tar. On my machine this sped up a cp benchmark, which
> copied a 10 GB entirely-sparse file on an NFS file system, by a
> factor of 2800 in real seconds.
Hi Paul,
Somehow I didn't see this patch from you until now, while looking
through the hundreds of outstanding (bug mostly resolved) bugs at
http://debbugs.gnu.org/coreutils. Sorry about that.
Now that we have FIEMAP support, (by the looks of things
we will soon have SEEK_HOLE support in cp and in the linux kernel)
do you think adding support for this special case is worthwhile?
I could go either way.
If so, would you care to rebase it for 8.13?
coreutils-8.12 will probably be coming soon to adjust FIEMAP
support not to collide with the combination of XFS, 2.6.39
release-candidate kernels and so called "unwritten extents".
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6906
; Package
coreutils
.
(Sun, 17 Apr 2011 16:30:04 GMT)
Full text and
rfc822 format available.
Message #11 received at 6906 <at> debbugs.gnu.org (full text, mbox):
On 04/17/11 01:55, Jim Meyering wrote:
> Now that we have FIEMAP support, (by the looks of things
> we will soon have SEEK_HOLE support in cp and in the linux kernel)
> do you think adding support for this special case is worthwhile?
> I could go either way.
>
> If so, would you care to rebase it for 8.13?
Yes, I expect it's worthwhile, as the FIEMAP stuff isn't universal.
I'll add it to my list of thing to do. It's not high priority,
to be sure.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#6906
; Package
coreutils
.
(Wed, 10 Oct 2018 16:37:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 6906 <at> debbugs.gnu.org (full text, mbox):
(triaging old bugs)
Hello,
On 17/04/11 10:28 AM, Paul Eggert wrote:
> On 04/17/11 01:55, Jim Meyering wrote:
>> Now that we have FIEMAP support, (by the looks of things
>> we will soon have SEEK_HOLE support in cp and in the linux kernel)
>> do you think adding support for this special case is worthwhile?
>> I could go either way.
>>
>> If so, would you care to rebase it for 8.13?
>
> Yes, I expect it's worthwhile, as the FIEMAP stuff isn't universal.
> I'll add it to my list of thing to do. It's not high priority,
> to be sure.
In the 8 years since the original thread,
cp(1) can now copy sparse files very fast (though I suspect it's still
with FIEMAP and not SEEK_DATA/HOLE).
https://bugs.gnu.org/6906
Can this be closed as out-dated?
regards,
- assaf
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Thu, 11 Oct 2018 02:15:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
bug acknowledged by developer.
(Thu, 11 Oct 2018 02:15:04 GMT)
Full text and
rfc822 format available.
Message #19 received at 6906-done <at> debbugs.gnu.org (full text, mbox):
Assaf Gordon wrote:
> Can this be closed as out-dated?
Yes, that's fine. Closing.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 08 Nov 2018 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 181 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.