GNU bug report logs - #9500
cp: use posix_fallocate where supported

Previous Next

Package: coreutils;

Reported by: Kelly Anderson <kelly <at> silka.with-linux.com>

Date: Wed, 14 Sep 2011 06:47:02 UTC

Severity: wishlist

Tags: patch

To reply to this bug, email your comments to 9500 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 14 Sep 2011 06:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kelly Anderson <kelly <at> silka.with-linux.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 14 Sep 2011 06:47:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kelly Anderson <kelly <at> silka.with-linux.com>
To: bug-coreutils <at> gnu.org
Subject: [PATCH]: use posix_fallocate where supported
Date: Tue, 13 Sep 2011 23:55:26 -0600
Hi,

I put together a patch 2 or 3 years ago (back when posix_fallocate was 
first introduced in glibc).
I've been using coreutils ever since with that patch applied with no 
problems.  The only error
I ever encountered (I had my patch error when posix_fallocate failed at 
that time) was when
I tried to copy a 25Gig file to a vfat partition, that's what it should 
do with a file over
4Gigs on a fat32 partition.  Anyway, I changed my patch to silently 
ignore posix_fallocate
errors, so coreutils would error the same as it currently does.

I copy a lot of large media files around on my servers and I want their 
space/continuity
to be allocated as efficiently as possible.

This patch has been tested for 2 to 3 years by me, so it should be good 
to go.
The patch applies to coreutils 8.13.

--- ./configure.ac.orig    2011-08-19 13:40:11.000000000 -0600
+++ ./configure.ac    2011-09-13 23:29:57.277354329 -0600
@@ -242,6 +242,18 @@ AC_DEFUN([coreutils_DUMMY_1],
 ])
 coreutils_DUMMY_1

+dnl * Old glibcs have broken posix_fallocate(). Make sure not to use it.
+AC_TRY_COMPILE([
+  #define _XOPEN_SOURCE 600
+  #include <stdlib.h>
+  #if defined(__GLIBC__) && (__GLIBC__ < 2 || __GLIBC_MINOR__ < 7)
+    possibly broken posix_fallocate
+  #endif
+], [
+  posix_fallocate(0, 0, 0);
+], [
+  AC_DEFINE([HAVE_POSIX_FALLOCATE], [1], [Define if you have a working 
posix_fallocate()]) ])
+
 AC_MSG_CHECKING([ut_host in struct utmp])
 AC_CACHE_VAL([su_cv_func_ut_host_in_utmp],
 [AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <sys/types.h>
--- ./src/copy.c.orig    2011-07-28 04:38:27.000000000 -0600
+++ ./src/copy.c    2011-09-13 23:29:57.280354149 -0600
@@ -1026,6 +1026,16 @@ copy_reg (char const *src_name, char con
           size_t blcm = buffer_lcm (io_blksize (src_open_sb), buf_size,
                                     blcm_max);

+#ifdef HAVE_POSIX_FALLOCATE
+          if (S_ISREG(src_open_sb.st_mode)
+ && ! S_ISFIFO(sb.st_mode)
+ && src_open_sb.st_size >= buf_size)
+          {
+            /* ignore errors, some filesystems may error if filesize 
exceeds the filesystem's limit */
+            posix_fallocate (dest_desc, 0, src_open_sb.st_size);
+          }
+#endif
+
           /* Do not bother with a buffer larger than the input file, 
plus one
              byte to make sure the file has not grown while reading 
it.  */
           if (S_ISREG (src_open_sb.st_mode) && src_open_sb.st_size < 
buf_size)





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 14 Sep 2011 14:37:01 GMT) Full text and rfc822 format available.

Message #8 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Kelly Anderson <kelly <at> silka.with-linux.com>
Cc: 9500 <at> debbugs.gnu.org
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 14 Sep 2011 08:06:49 -0600
On 09/13/2011 11:55 PM, Kelly Anderson wrote:
> Hi,
>
> I put together a patch 2 or 3 years ago (back when posix_fallocate was
> first introduced in glibc).

Thanks for the effort.  However, this has been discussed in the past, 
and the consensus was that we should first write a patch to gnulib that 
provides a posix_fallocate() stub for all platforms, so that coreutils 
can unconditionally call posix_fallocate, rather than making coreutils 
have to use #ifdef.  Among other things, a gnulib module would make it 
possible to emulate posix_fallocate() even on older glibc where it is 
missing or broken.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 14 Sep 2011 14:52:01 GMT) Full text and rfc822 format available.

Message #11 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 14 Sep 2011 15:46:47 +0100
On 09/14/2011 03:06 PM, Eric Blake wrote:
> On 09/13/2011 11:55 PM, Kelly Anderson wrote:
>> Hi,
>>
>> I put together a patch 2 or 3 years ago (back when posix_fallocate was
>> first introduced in glibc).
> 
> Thanks for the effort.  However, this has been discussed in the past, and the consensus was that we should first write a patch to gnulib that provides a posix_fallocate() stub for all platforms, so that coreutils can unconditionally call posix_fallocate, rather than making coreutils have to use #ifdef.  Among other things, a gnulib module would make it possible to emulate posix_fallocate() even on older glibc where it is missing or broken.
> 

Also we probably want fallocate() for this use case
rather than posix_fallocate() in any case,
as we don't want to fall back to writing zeros.

Also I had a whole lot of fallocate() things to try
once the fiemap() stuff landed, but unfortunately
that doesn't work reliably on all file systems
and is currently restricted to sparse files.
So I need to dig out my notes on how to apply
fallocate() to files with holes and "empty portions" again.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 23 Nov 2011 00:51:01 GMT) Full text and rfc822 format available.

Message #14 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 23 Nov 2011 00:49:11 +0000
On 09/14/2011 03:46 PM, Pádraig Brady wrote:
> On 09/14/2011 03:06 PM, Eric Blake wrote:
>> On 09/13/2011 11:55 PM, Kelly Anderson wrote:
>>> Hi,
>>>
>>> I put together a patch 2 or 3 years ago (back when posix_fallocate was
>>> first introduced in glibc).
>>
>> Thanks for the effort.  However, this has been discussed in the past, and the consensus was that we should first write a patch to gnulib that provides a posix_fallocate() stub for all platforms, so that coreutils can unconditionally call posix_fallocate, rather than making coreutils have to use #ifdef.  Among other things, a gnulib module would make it possible to emulate posix_fallocate() even on older glibc where it is missing or broken.
>>
> 
> Also we probably want fallocate() for this use case
> rather than posix_fallocate() in any case,
> as we don't want to fall back to writing zeros.
> 
> Also I had a whole lot of fallocate() things to try
> once the fiemap() stuff landed, but unfortunately
> that doesn't work reliably on all file systems
> and is currently restricted to sparse files.
> So I need to dig out my notes on how to apply
> fallocate() to files with holes and "empty portions" again.

I thought a little about this today.

fallocate() is a feature to quickly allocate space in a file system.
It's useful for 3 things as far as I can see:

  1. Improved file layout for subsequent access
  2. Immediate indication of ENOSPC
  3. Efficient writing of NUL portions

Note 1. is somewhat moot with newer file systems that do "delayed allocation".
So what do we need to consider when using fallocate on the destination file?
Considering just cp for the moment, its inputs impacting this are the options:

  --sparse={auto,always,never}
  Note with no --sparse specified we behave with --sparse=auto,
  where we try to detect holes based on st_size vs st_blocks

The other significant input is the construction of the source file.
Now data in a file can generally be classed into 4 types:

  Data:  normal data
  Zero:  normal data containing only NULs
  Hole:  unallocated data containing only NULs
  Empty: allocated data containing only NULs

  One can have any of the above types at any point in the file.
  Also 'Empty' is special in that it can extend beyond the apparent size.
  In fact this tail allocation is common on XFS for performance reasons.

An important factor is how well we can distinguish the above data classes.
There are currently three possible identification options:

  Heuristics
    This is used by default to see if holes might be present.
    The test is simply st_size >= the appropriate number of allocated st_blocks.
    Note, this can fail for example in the case where there is
    a tail allocation not accounted for in the size like:

      +-----------+---+
      | D | E | H | E |
      +-----------+---+

    Traditionally when a sparse source is detected we check input blocks
    for all zeros and create a 'Hole' in the destination instead.
    This is inefficient as it requires reading all the NUL data
    and verifying that it is in fact NUL.

  SEEK_HOLE
    Available on linux since 3.1

    'Empty' is treated like a 'Hole' which at least
    allows 'Empty' portions to be processed quickly by `cp`.

    We lose the ability to copy the allocation from src to dst.

  fiemap
    Available on linux since around 2.6.39

    Gives greater control by distinguishing Hole and Empty,
    thus allowing us to both efficiently copy and maintain allocation.

    Requires sync on ext4, xfs

    Code already done and used (with sync) for sparse files

    Note by not being able to use fiemap with non sparse files,
    means that we need to read() the empty extents which is
    inefficient, especially in --sparse=always mode.


So given the above info, what functionality might the use
of fallocate() make available to cp?

Exact copy from source to dest:

  Copying the source layout would mean that one could for example,
  create a backup copy of a large db file, which could be then used
  without worrying about fragmentation or ENOSPC issues.

  There is the argument that this might be better as a higher level
  file operation anyway, and perhaps `cp --reflink` might cover
  this use case on some file systems at least.

  fiemap gives us most control, allowing us to copy even tail
  allocations from source to destination. But the sync issue
  makes it not usable in general at present, and is currently
  restricted to sparse files where it's used to avoid reading
  'Empty' and 'Hole' portions.

Copying sparse files

 It's worth noting again, the caveat mentioned above that we
 might not recognise some sparse files due to tail allocation.

 Given that we use fiemap (with sync) for sparse files at present,
 we can augment the fiemap copying code to use fallocate where appropriate.
  So dependent on the options the operations would be:
    --sparse=auto   => 'Empty' -> 'Empty'
    --sparse=always => 'Empty' -> 'Hole'  && discard tail allocation
    --sparse=never  => 'Hole'  -> 'Empty'
 Perhaps the first case could be simplified to initially doing:
    fallocate(dest, blocks*blocksize))

Copying normal files

 Note using SEEK_HOLE for this case, would only help
 to avoid reading 'Hole' and more likely 'Empty' portions,
 and should not impact on the use of fallocate(dest).

 So assuming we initially did:

   if ! --sparse=always
     fallocate(dest, st_size)

 That would throw away any tail allocation in the source,
 which is probably OK as noted above. In fact we might always
 discard tail allocation for consistency, unless we can use fiemap
 for all cases.

I'll cook something up on this soon.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 23 Nov 2011 09:49:01 GMT) Full text and rfc822 format available.

Message #17 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 23 Nov 2011 10:46:56 +0100
Pádraig Brady wrote:
...
> I thought a little about this today.

Nice description of the issues.

It's probably worth putting something like this somewhere in version
control, even if only as a long commit message on whatever change you make.

> fallocate() is a feature to quickly allocate space in a file system.
> It's useful for 3 things as far as I can see:
>
>   1. Improved file layout for subsequent access
>   2. Immediate indication of ENOSPC
>   3. Efficient writing of NUL portions
>
> Note 1. is somewhat moot with newer file systems that do "delayed allocation".
> So what do we need to consider when using fallocate on the destination file?
> Considering just cp for the moment, its inputs impacting this are the options:
>
...
> Copying sparse files
>
>  It's worth noting again, the caveat mentioned above that we
>  might not recognise some sparse files due to tail allocation.

Yes, this is worth repeating ;-)
It is surprising, at least in part because significant tail
allocation is not common.

>  Given that we use fiemap (with sync) for sparse files at present,
>  we can augment the fiemap copying code to use fallocate where appropriate.
>   So dependent on the options the operations would be:
>     --sparse=auto   => 'Empty' -> 'Empty'
>     --sparse=always => 'Empty' -> 'Hole'  && discard tail allocation
>     --sparse=never  => 'Hole'  -> 'Empty'
>  Perhaps the first case could be simplified to initially doing:
>     fallocate(dest, blocks*blocksize))
>
> Copying normal files
>
>  Note using SEEK_HOLE for this case, would only help
>  to avoid reading 'Hole' and more likely 'Empty' portions,
>  and should not impact on the use of fallocate(dest).
>
>  So assuming we initially did:
>
>    if ! --sparse=always
>      fallocate(dest, st_size)
>
>  That would throw away any tail allocation in the source,
>  which is probably OK as noted above. In fact we might always
>  discard tail allocation for consistency, unless we can use fiemap
>  for all cases.

All sounds reasonable.

> I'll cook something up on this soon.

Thanks.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 23 Nov 2011 13:59:01 GMT) Full text and rfc822 format available.

Message #20 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: "Voelker, Bernhard" <bernhard.voelker <at> siemens-enterprise.com>
To: Jim Meyering <jim <at> meyering.net>, Pádraig Brady
	<P <at> draigBrady.com>
Cc: "9500 <at> debbugs.gnu.org" <9500 <at> debbugs.gnu.org>,
	Kelly Anderson <kelly <at> silka.with-linux.com>, Eric Blake <eblake <at> redhat.com>
Subject: RE: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 23 Nov 2011 14:56:41 +0100
Jim Meyering wrote:

> Pádraig Brady wrote:
> ...
> > I thought a little about this today.
> 
> Nice description of the issues.

BTW: there was a discussion recently about the
fallocate utility of util-linux, e.g.
http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/5045
Maybe looking into fallocate.c can help.

Berny




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 25 Nov 2011 10:00:02 GMT) Full text and rfc822 format available.

Message #23 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 9500 <at> debbugs.gnu.org
Cc: Kelly Anderson <kelly <at> silka.with-linux.com>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Fri, 25 Nov 2011 09:58:05 +0000
-------- Original Message --------
Subject: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Fri, 25 Nov 2011 10:35:34 +0100
From: Goswin von Brederlow <goswin-v-b <at> web.de>
To: Padraig Brady <p <at> draigBrady.com>

Hi,

On 09/14/2011 03:46 PM, Pádraig Brady wrote:

> I thought a little about this today.
>
> fallocate() is a feature to quickly allocate space in a file system.
> It's useful for 3 things as far as I can see:
>
>   1. Improved file layout for subsequent access
>   2. Immediate indication of ENOSPC
>   3. Efficient writing of NUL portions
>
> Note 1. is somewhat moot with newer file systems that do "delayed allocation".
> So what do we need to consider when using fallocate on the destination file?
> Considering just cp for the moment, its inputs impacting this are the options:

Not every filesystem does delayed allocation and delayed allocation only
works for file considerably smaller than the amount of cache.


Also a note on sparse files:

If you have fiemap then you can fallocate the used chunks in the
destination. But there might be zero filled blocks that would become
unallocated with current cp. Those would remain allocated (but not zero
filles, which might be a difference) in the destination. Same if the
file is fully allocated but has empty blocks. So a bit more has to be
done.

So here is what I would do for sparse files:

- get fiemap of file, if it fails the fiemap is just one chunk for the
  whole file

- For each chunk in the fiemap:
  1 read block from source
  2 if (block == ZERO) goto 7
  3 fallocate(offset, end of chunk)
  4 write block
  5 read block from source
  6 if (block != ZERO) goto 4
  7 fruncate(offset)
  8 read block from source
  9 if (block == ZERO) goto 8
 10 goto 3

Failure of fallocate should probably be ignored to preserve current
behaviour and because if empty blocks are found less space will be used
and might actually fit.

The above would result in the same allocation for the file as current cp
creates for sparse files.

If you want to copy files while preserving the allocation of the source
then I think cp needs a new --sparse=source mode. In that mode setp 7
would change from ftruncate() to write(ZERO) to preserve the empty
blocks and you would only fallocate once per chunk for the full chunk.

Hope that helps.

MfG
        Goswin

PS: Feel free to forward to the mailinglist, gmane doesn't show its
address or I would cc.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 25 Nov 2011 10:16:01 GMT) Full text and rfc822 format available.

Message #26 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 9500 <at> debbugs.gnu.org
Cc: Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Fri, 25 Nov 2011 10:13:39 +0000
On 11/25/2011 09:58 AM, Pádraig Brady wrote:
> -------- Original Message --------
> Subject: bug#9500: [PATCH]: use posix_fallocate where supported
> Date: Fri, 25 Nov 2011 10:35:34 +0100
> From: Goswin von Brederlow <goswin-v-b <at> web.de>
> To: Padraig Brady <p <at> draigBrady.com>
> 
> Hi,
> 
> On 09/14/2011 03:46 PM, Pádraig Brady wrote:
> 
>> I thought a little about this today.
>>
>> fallocate() is a feature to quickly allocate space in a file system.
>> It's useful for 3 things as far as I can see:
>>
>>   1. Improved file layout for subsequent access
>>   2. Immediate indication of ENOSPC
>>   3. Efficient writing of NUL portions
>>
>> Note 1. is somewhat moot with newer file systems that do "delayed allocation".
>> So what do we need to consider when using fallocate on the destination file?
>> Considering just cp for the moment, its inputs impacting this are the options:
> 
> Not every filesystem does delayed allocation and delayed allocation only
> works for file considerably smaller than the amount of cache.

Right.

> Also a note on sparse files:
> 
> If you have fiemap then you can fallocate the used chunks in the
> destination. But there might be zero filled blocks that would become
> unallocated with current cp. Those would remain allocated (but not zero
> filles, which might be a difference) in the destination. Same if the
> file is fully allocated but has empty blocks. So a bit more has to be
> done.
> 
> So here is what I would do for sparse files:
> 
> - get fiemap of file, if it fails the fiemap is just one chunk for the
>   whole file
> 
> - For each chunk in the fiemap:
>   1 read block from source
>   2 if (block == ZERO) goto 7
>   3 fallocate(offset, end of chunk)
>   4 write block
>   5 read block from source
>   6 if (block != ZERO) goto 4
>   7 fruncate(offset)
>   8 read block from source
>   9 if (block == ZERO) goto 8
>  10 goto 3

I didn't fully understand the above,
but I was going to do essentially that I think.
The adjustments to the fiemap copy code
should be fairly obvious I think.

> Failure of fallocate should probably be ignored to preserve current
> behaviour and because if empty blocks are found less space will be used
> and might actually fit.

> The above would result in the same allocation for the file as current cp
> creates for sparse files.
> 
> If you want to copy files while preserving the allocation of the source
> then I think cp needs a new --sparse=source mode.

I think --spare=auto (the default) should cover that case if needed.
However as previously stated, this might be outside cp's remit.

> In that mode setp 7
> would change from ftruncate() to write(ZERO) to preserve the empty
> blocks and you would only fallocate once per chunk for the full chunk.
> 
> Hope that helps.
> 
> MfG
>         Goswin
> 
> PS: Feel free to forward to the mailinglist, gmane doesn't show its
> address or I would cc.

Hmm, one of your points above got me thinking.
Might fallocate() fail to allocate an extent with ENOSPC,
but there could be fragmented space available to write()?
That would scupper benefit (2) above :(
I'll ask linux-fsdevel <at> vger.kernel.org

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 25 Nov 2011 11:01:02 GMT) Full text and rfc822 format available.

Message #29 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 9500 <at> debbugs.gnu.org
Cc: Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Fri, 25 Nov 2011 10:59:09 +0000
On 11/25/2011 10:13 AM, Pádraig Brady wrote:

> Hmm, one of your points above got me thinking.
> Might fallocate() fail to allocate an extent with ENOSPC,
> but there could be fragmented space available to write()?
> That would scupper benefit (2) above :(
> I'll ask linux-fsdevel <at> vger.kernel.org

And the response from there is that fallocate() will
check the free blocks first, and then try to allocate
as a contiguous extent, but that part is not guaranteed.
So we still get benefit (2).

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Sat, 26 Nov 2011 08:17:02 GMT) Full text and rfc822 format available.

Message #32 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Goswin von Brederlow <goswin-v-b <at> web.de>
To: Padraig Brady <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Sat, 26 Nov 2011 04:48:36 +0100
Pádraig Brady <P <at> draigBrady.com> writes:

> Hmm, one of your points above got me thinking.
> Might fallocate() fail to allocate an extent with ENOSPC,
> but there could be fragmented space available to write()?
> That would scupper benefit (2) above :(
> I'll ask linux-fsdevel <at> vger.kernel.org
>
> cheers,
> Pádraig.

fallocate() in no way garanties that the space is continious. Any
filesystem that is halfway smart will use a continious chunk if
possible. But it should only fail if there is really no space, no matter
how fragmented, left.

MfG
        Goswin




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Sat, 26 Nov 2011 08:17:02 GMT) Full text and rfc822 format available.

Message #35 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Goswin von Brederlow <goswin-v-b <at> web.de>
To: Padraig Brady <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Sat, 26 Nov 2011 04:55:08 +0100
Pádraig Brady <P <at> draigBrady.com> writes:

> On 11/25/2011 10:13 AM, Pádraig Brady wrote:
>
>> Hmm, one of your points above got me thinking.
>> Might fallocate() fail to allocate an extent with ENOSPC,
>> but there could be fragmented space available to write()?
>> That would scupper benefit (2) above :(
>> I'll ask linux-fsdevel <at> vger.kernel.org
>
> And the response from there is that fallocate() will
> check the free blocks first, and then try to allocate
> as a contiguous extent, but that part is not guaranteed.
> So we still get benefit (2).
>
> cheers,
> Pádraig.

Don't forget that a sparse file uses less space than its size. So any cp
mode that checks for zero filled blocks and omits them on the
destination might use less blocks than the source uses or than the
fiemap indicates.

So cp can only fail on ENOSPC from fallocate() if it does not skip zero
filled blocks. I.e. when --sparse=never or --sparse=auto doesn't detect
a sparse file. But that is quite often the case.

MfG
        Goswin




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Tue, 29 Nov 2011 13:02:01 GMT) Full text and rfc822 format available.

Message #38 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 9500 <at> debbugs.gnu.org
Cc: Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Tue, 29 Nov 2011 12:59:01 +0000
On 11/25/2011 10:59 AM, Pádraig Brady wrote:
> On 11/25/2011 10:13 AM, Pádraig Brady wrote:
> 
>> Hmm, one of your points above got me thinking.
>> Might fallocate() fail to allocate an extent with ENOSPC,
>> but there could be fragmented space available to write()?
>> That would scupper benefit (2) above :(
>> I'll ask linux-fsdevel <at> vger.kernel.org
> 
> And the response from there is that fallocate() will
> check the free blocks first, and then try to allocate
> as a contiguous extent, but that part is not guaranteed.
> So we still get benefit (2).

However that thread has continued, and it
was mentioned that benefit (1) might actually
be detrimental, at least for small files.

http://thread.gmane.org/gmane.linux.file-systems/59092

le sigh,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Tue, 29 Nov 2011 13:25:02 GMT) Full text and rfc822 format available.

Message #41 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Tue, 29 Nov 2011 14:22:33 +0100
Pádraig Brady wrote:
> On 11/25/2011 10:59 AM, Pádraig Brady wrote:
>> On 11/25/2011 10:13 AM, Pádraig Brady wrote:
>>
>>> Hmm, one of your points above got me thinking.
>>> Might fallocate() fail to allocate an extent with ENOSPC,
>>> but there could be fragmented space available to write()?
>>> That would scupper benefit (2) above :(
>>> I'll ask linux-fsdevel <at> vger.kernel.org
>>
>> And the response from there is that fallocate() will
>> check the free blocks first, and then try to allocate
>> as a contiguous extent, but that part is not guaranteed.
>> So we still get benefit (2).
>
> However that thread has continued, and it
> was mentioned that benefit (1) might actually
> be detrimental, at least for small files.
>
> http://thread.gmane.org/gmane.linux.file-systems/59092

Sigh, indeed.
Do you think it's even worth an option, now?




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Tue, 29 Nov 2011 14:16:02 GMT) Full text and rfc822 format available.

Message #44 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Tue, 29 Nov 2011 14:13:27 +0000
On 11/29/2011 01:22 PM, Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 11/25/2011 10:59 AM, Pádraig Brady wrote:
>>> On 11/25/2011 10:13 AM, Pádraig Brady wrote:
>>>
>>>> Hmm, one of your points above got me thinking.
>>>> Might fallocate() fail to allocate an extent with ENOSPC,
>>>> but there could be fragmented space available to write()?
>>>> That would scupper benefit (2) above :(
>>>> I'll ask linux-fsdevel <at> vger.kernel.org
>>>
>>> And the response from there is that fallocate() will
>>> check the free blocks first, and then try to allocate
>>> as a contiguous extent, but that part is not guaranteed.
>>> So we still get benefit (2).
>>
>> However that thread has continued, and it
>> was mentioned that benefit (1) might actually
>> be detrimental, at least for small files.
>>
>> http://thread.gmane.org/gmane.linux.file-systems/59092
> 
> Sigh, indeed.
> Do you think it's even worth an option, now?

Having an option to enable is much less interesting.
One might auto-enable it for files > 16MB or something.
I'm reserving judgement until I understand the issues better.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Tue, 29 Nov 2011 16:10:02 GMT) Full text and rfc822 format available.

Message #47 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Jim Meyering <jim <at> meyering.net>, Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Tue, 29 Nov 2011 08:06:57 -0800
My read of the situation is that the filesystem guys have
spent a lot of time optimizing ordinary write but they
haven't gotten around to optimizing fallocate because it's
so rarely used -- which means that if one uses fallocate
one gets lousy performance.

It's a chicken and egg problem.

If coreutils started using fallocate now, one can be pretty
sure they'd tune their filesystems over the next few years,
to make fallocate compatible with delayed-write optimizations.
On the other hand if nobody uses fallocate, there will be little
incentive on their part to make it go fast.

It's a question of whether we want to inflict temporary pain
on users for a long-term benefit (early warning of file system
full, which is something I'd dearly love to have).




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Wed, 30 Nov 2011 17:55:01 GMT) Full text and rfc822 format available.

Message #50 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Goswin von Brederlow <goswin-v-b <at> web.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com>,
	Padraig Brady <P <at> draigBrady.com>, Jim Meyering <jim <at> meyering.net>,
	Goswin von Brederlow <goswin-v-b <at> web.de>
Subject: Re: bug#9500: [PATCH]: use posix_fallocate where supported
Date: Wed, 30 Nov 2011 18:54:15 +0100
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> My read of the situation is that the filesystem guys have
> spent a lot of time optimizing ordinary write but they
> haven't gotten around to optimizing fallocate because it's
> so rarely used -- which means that if one uses fallocate
> one gets lousy performance.
>
> It's a chicken and egg problem.
>
> If coreutils started using fallocate now, one can be pretty
> sure they'd tune their filesystems over the next few years,
> to make fallocate compatible with delayed-write optimizations.
> On the other hand if nobody uses fallocate, there will be little
> incentive on their part to make it go fast.
>
> It's a question of whether we want to inflict temporary pain
> on users for a long-term benefit (early warning of file system
> full, which is something I'd dearly love to have).

I totaly agree.

I also don't buy Daves analysis that fallocate() will hurt the
filesystem.

Sure, it will place blocks on the disk disjunct from the write
pattern. So if you have a lot of files being written in parallel the
fallocate() will make the disk seek more. But the data for each file
will end up sequentially on disk. Without fallocate() it will be layed
out in order of the write pattern, i.e. 4MB of this file, 4MB of that
file, 4MB of the next file, 4MB of the first file and so on. Lots of
fragments of a size the systems cache and delayed allocation allowed.

So fallocate() might hurt write speed with many parallel writes but it
will keep the fragmentation down and speed up future reads. A one time
penalty for many times advantages in the future.


As for filesystem aligning all fallocate() chunks and creating
fragmentation in their free space: Too bad for them. FIX THE FILESYSTEM.
If I tell the FS that I'm only going to write 32k then it should not
force alignment to a 1MB chunk of free space. Instead it should find
some nice little 32k fragment of free space left over somewhere else.
Put all the 32K files together to fill up the 1MB stripe of an raid.

As for fallocate() causing more IOPS I don't buy that either. Either the
data is too big for the cache so that it is forced out with fallocate()
or delayed allocation or it is so small that it remains in cache in both
cases and the elevator code should write it out sequentially. I mean we
are not talking about opening 1000 files, fallocat()ing them to 100GB
each and then writing 4k chunks to each in a round-robin way. Cp is
writing ONE sequential stream as fast as possible.

As for his assertion that three major Linux filesystems (XFS, BTRFS and
ext4) don't need fallocate() because they use delayed allocation that is
plain not true for large files.



As a compromise cp could start with using fallocate() only for largish
files, say 16MB and above. Anything smaller can probably be handled by
delayed allocation or goes out so fast it stays in one chunk anyway.

MfG
        Goswin




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 11 May 2012 15:56:02 GMT) Full text and rfc822 format available.

Message #53 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 9500 <at> debbugs.gnu.org, Mark <markk <at> clara.co.uk>
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 16:55:35 +0100
On 05/11/2012 04:03 PM, Mark wrote:
> Hi,
>
> Here's a patch for cp which adds a new --preallocate option. When
> specified, cp allocates disk space for the destination file before writing
> data. It uses fallocate() with FALLOC_FL_KEEP_SIZE on Linux, falling back
> to posix_fallocate() if that fails.

Thanks for taking the time to do this.
This feature is already under consideration.
See the comments at: http://bugs.gnu.org/9500

> Benefits of preallocation:
>  - Disk fragmentation can be greatly reduced. That means faster file
> access and less filesystem overhead (fewer extents).
>  - Recovering data after filesystem corruption should be more successful,
> since files are more likely to be contiguous.
>  - If you're e.g. copying a virtual machine disk image file, the
> destination should be (almost) contiguous, meaning that running a disk
> optimiser/defragmenter in the guest OS would work as it should (i.e.
> improve performance).
>
> This is a very preliminary patch for testing. Hopefully someone will find
> it useful. And hopefully someone who (a) has a clue when it comes to C
> programming, and (b) is familiar with the coreutils source (I'm neither)
> can work from this to produce something which could be included in a
> future release.
>
> Note that posix_fallocate() sets the destination file size. If your system
> doesn't support fallocate() with FALLOC_FL_KEEP_SIZE, you can't e.g. do
> "ls -l destfilename" to monitor the progress of a large file copy; the
> length shown will always be the final length.
>
> Pre-allocating space can defeat the object of --sparse=always (or the
> default sparse-checking heuristic). If copying files with large holes you
> probably won't want to use --preallocate. If you do, regions in the
> destination corresponding to holes in the source will be allocated but
> unwritten. You'll lose the disk-space-saving benefit, but keep the
> fast-reading-of-holes benefit. On the other hand, that feature could be
> useful sometimes.
>
> In the general case of copying non-sparse files, it should be beneficial
> to use --preallocate. However on some systems, when the destination
> filesystem does not support pre-allocation (e.g. FAT32), the
> implementation of posix_fallocate() might try to fill the region to be
> pre-allocated with zeros. That would double copy time for no benefit.
>
> To-do list:
>  - Add --preallocate option to mv as well
>  - Should the option name be changed to --pre-allocate?
>  - Maybe have an option to tell cp to pre-allocate space for all
> destination files in one go, rather than pre-allocating space for each
> individual file before copying?

I don't think there should be an option at all.
cp should have enough info to do the right thing.
Why would you even not want to preallocate?
In saying that, using fallocate with XFS triggers
alignment behavior that causes fragmentation.
But this might change, and the user can't be expected to know this.
BTW I'm thinking of adding a new FALLOC_FL_ALIGN flag
to the kernel, that XFS can use in its tools to enable that
separate functionality.

>  - Check the error code that fallocate() returns. If it says the
> filesystem does not support fallocate(), don't call it again for every
> other file being copied.
>  - Better handling of sparse files, e.g. don't call fallocate() if source
> file is sparse and --sparse=always is given.

That's an important consideration.

>  - If pre-allocation fails due to insufficient disk space, cp prints a
> message and continues. So typically it will fill up the disk then abort
> with an out-of-disk-space error. It would be nice to be able to tell cp
> to abort when a pre-allocation fails, so it can exit without wasting
> time.

Yes it should exit immediately on ENOSPC

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 11 May 2012 19:20:02 GMT) Full text and rfc822 format available.

Message #56 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Mark <markk <at> clara.co.uk>
Cc: 9500 <at> debbugs.gnu.org
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 20:19:09 +0100
On 05/11/2012 06:40 PM, Mark wrote:
> Hi,
> 
> On Fri, May 11, 2012 16:45, Pádraig Brady wrote:
>> On 05/11/2012 04:03 PM, Mark wrote:
>>> Here's a patch for cp which adds a new --preallocate option. When
>>> specified, cp allocates disk space for the destination file before
>>> ...
>>>
>>> To-do list:
>>>  - Add --preallocate option to mv as well
>>>  - Should the option name be changed to --pre-allocate?
>>>  - Maybe have an option to tell cp to pre-allocate space for all
>>> destination files in one go, rather than pre-allocating space for each
>>> individual file before copying?
>>
>> I don't think there should be an option at all.
>> cp should have enough info to do the right thing.
>> Why would you even not want to preallocate?
> 
> Apart from the case I mentioned where posix_fallocate() might write out
> zeros on some systems, I can't think of many cases where pre-allocating
> wouldn't be a good idea. So perhaps at some point in the future it would
> be the default, with a --no-preallocate option to disable.
> 
> 
>>>  - Better handling of sparse files, e.g. don't call fallocate() if
>>> source file is sparse and --sparse=always is given.
>>
>> That's an important consideration.
> 
> Even for sparse files, if cp can determine where the holes are beforehand
> (using SEEK_HOLE/SEEK_DATA or fiemap), it could pre-allocate the non-hole
> regions. So with more work, pre-allocation could be used with sparse
> files. Implementing that is beyond my abilities though...
> 
> It would still be helpful to allow pre-allocation to be disabled. (E.g.
> source file has many written/allocated all-zero regions, which the user
> wants to turn into holes in the destination file.)
> 
> Still another option: the ability to use FALLOC_FL_PUNCH_HOLE to punch
> holes in (pre-allocated) destination files. Then any all-zero non-hole
> regions in the source could be turned into holes in the destination in
> conjunction with --sparse=always. By pre-allocating, cp would at least
> guarantee the copy won't fail due to insufficient disk space, which may be
> useful in some cases.
> 
> 
>>>  - If pre-allocation fails due to insufficient disk space, cp prints a
>>> message and continues. So typically it will fill up the disk then abort
>>> with an out-of-disk-space error. It would be nice to be able to tell cp
>>> to abort when a pre-allocation fails, so it can exit without wasting
>>> time.
>>
>> Yes it should exit immediately on ENOSPC
> 
> Aborting by default would make sense. Ideally that would be configurable.
> In some cases the user might prefer to get a warning, so they can Ctrl-Z
> suspend cp and delete/move other files to make space, instead of having cp
> abort.
> 
> 
> An interesting aside: I tried using cp to pre-allocate space for a very
> large file on an ext4 partition, much larger than the amount of free
> space. IMHO it would be best for the filesystem to fail immediately in
> that case. ext4 does a lot of work (there was a lot of disk activity and
> it took a long time to fail). ext4 pre-allocates as much of the requested
> region as possible, rather than succeeding or failing all-or-nothing. So
> you get a disk-full condition. (Of course that's no worse than what
> happens when you run cp normally. But it would happen much more quickly
> with pre-allocation.)

Well that's bad as you get a delay in addition to the normal copy.
However, I don't see that behavior with 2.6.40.4-5.fc15.x86_64 at least?

$ df -Th .
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sdb1     ext4     97G   85G  7.1G  93% /home

$ for s in 1 10 100; do time fallocate -l ${s}G testfile; rm testfile; done
sys	0m0.121s
fallocate: testfile: fallocate failed: No space left on device
sys	0m0.345s
fallocate: testfile: fallocate failed: No space left on device
sys	0m0.001s

> You can try that by doing something like this on an ext4 partition:
> truncate -s 9999999999999 testfile   # (create a sparse file larger than
> free space)
> cp --preallocate testfile testfile.copy

Note I'mm CC'ing the bug address rather than the coreutils <at> gnu.org address
to keep all of this thread associated with the bug.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 11 May 2012 20:22:02 GMT) Full text and rfc822 format available.

Message #59 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: "Mark" <markk <at> clara.co.uk>
To: "Pádraig Brady" <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 20:45:32 +0100
On Fri, May 11, 2012 20:19, Pádraig Brady wrote:
> On 05/11/2012 06:40 PM, Mark wrote:
>> ...
>> An interesting aside: I tried using cp to pre-allocate space for a very
>> large file on an ext4 partition, much larger than the amount of free
>> space. IMHO it would be best for the filesystem to fail immediately in
>> that case. ext4 does a lot of work (there was a lot of disk activity and
>> it took a long time to fail). ext4 pre-allocates as much of the
>> requested
>> region as possible, rather than succeeding or failing all-or-nothing. So
>> you get a disk-full condition. (Of course that's no worse than what
>> happens when you run cp normally. But it would happen much more quickly
>> with pre-allocation.)
>
> Well that's bad as you get a delay in addition to the normal copy.
> However, I don't see that behavior with 2.6.40.4-5.fc15.x86_64 at least?

I'm using kernel 3.0.0-19-generic #32-Ubuntu here. But probably more
relevant, the partition I tested on was ~1.7TB ext4 on an external USB 2.0
drive which was almost full and probably *very* fragmented, i.e. free
space spread all over the disk in thousands of small chunks. ext4 seems to
be pretty slow at allocating space in that case.

If I were designing a filesystem, I'd have it immediately return failure
if fallocate() is specified with additional size larger than the amount of
free space. Though for the filesystem, determining how much extra space a
fallocate() call would need can be quite involved in some cases and
require a significant amount of disk access...

Imagine a huge sparse file with many thousands of holes, and the requested
region for fallocate() serving to "fill in" many of the holes. But any
non-hole parts within the fallocate() region would reduce the amount of
additional space required for fallocate() to succeed. So it's not as
simple as comparing length of fallocate() region with amount of free
space...

Unless you're creating a new file, which is what cp does most of the time.
So maybe a workaround could be added to cp. If --preallocate is specified,
cp could check the amount of free space before writing to the destination
file and abort without even needing to call fallocate() if there isn't
enough. (In fact, cp could do that anyway in most cases I think?)


-- Mark






Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Fri, 11 May 2012 20:38:02 GMT) Full text and rfc822 format available.

Message #62 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Mark <markk <at> clara.co.uk>
Cc: 9500 <at> debbugs.gnu.org
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Fri, 11 May 2012 21:36:48 +0100
On 05/11/2012 08:45 PM, Mark wrote:
> On Fri, May 11, 2012 20:19, Pádraig Brady wrote:
>> On 05/11/2012 06:40 PM, Mark wrote:
>>> ...
>>> An interesting aside: I tried using cp to pre-allocate space for a very
>>> large file on an ext4 partition, much larger than the amount of free
>>> space. IMHO it would be best for the filesystem to fail immediately in
>>> that case. ext4 does a lot of work (there was a lot of disk activity and
>>> it took a long time to fail). ext4 pre-allocates as much of the
>>> requested
>>> region as possible, rather than succeeding or failing all-or-nothing. So
>>> you get a disk-full condition. (Of course that's no worse than what
>>> happens when you run cp normally. But it would happen much more quickly
>>> with pre-allocation.)
>>
>> Well that's bad as you get a delay in addition to the normal copy.
>> However, I don't see that behavior with 2.6.40.4-5.fc15.x86_64 at least?
> 
> I'm using kernel 3.0.0-19-generic #32-Ubuntu here. But probably more
> relevant, the partition I tested on was ~1.7TB ext4 on an external USB 2.0
> drive which was almost full and probably *very* fragmented, i.e. free
> space spread all over the disk in thousands of small chunks. ext4 seems to
> be pretty slow at allocating space in that case.

But you asked to fallocate(10GB).
That should have failed immediately,
because it's bigger than the file system

Could you run the fallocate loop from my previous mail,
across an appropriate range of sizes, just to confirm,
and maybe prepare input to a kernel bug report?

> If I were designing a filesystem, I'd have it immediately return failure
> if fallocate() is specified with additional size larger than the amount of
> free space. Though for the filesystem, determining how much extra space a
> fallocate() call would need can be quite involved in some cases and
> require a significant amount of disk access...
> 
> Imagine a huge sparse file with many thousands of holes, and the requested
> region for fallocate() serving to "fill in" many of the holes. But any
> non-hole parts within the fallocate() region would reduce the amount of
> additional space required for fallocate() to succeed. So it's not as
> simple as comparing length of fallocate() region with amount of free
> space...
> 
> Unless you're creating a new file, which is what cp does most of the time.
> So maybe a workaround could be added to cp. If --preallocate is specified,
> cp could check the amount of free space before writing to the destination
> file and abort without even needing to call fallocate() if there isn't
> enough. (In fact, cp could do that anyway in most cases I think?)

Good analysis. Still though for a new file the file system
should be able to to do the simple short calculation
of fallocate_request - free_space > 0.

I can understand inefficiencies in fallocating
around the free space limit, but otherwise
this seems like a bug in ext4.
(maybe a regression since I don't see it on my ext4 system).

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9500; Package coreutils. (Tue, 15 May 2012 11:46:01 GMT) Full text and rfc822 format available.

Message #65 received at 9500 <at> debbugs.gnu.org (full text, mbox):

From: "Mark" <markk <at> clara.co.uk>
To: "Pádraig Brady" <P <at> draigBrady.com>
Cc: 9500 <at> debbugs.gnu.org
Subject: Re: [RFC/PATCH] cp: Add option to pre-allocate space for files
Date: Tue, 15 May 2012 12:44:54 +0100
[Message part 1 (text/plain, inline)]
Hi,

[Resending message cc'ed to 9500 <at> debbugs.gnu.org as requested.]

On Fri, May 11, 2012 21:36, Pádraig Brady wrote:
> On 05/11/2012 08:45 PM, Mark wrote:
>> ...
>> I'm using kernel 3.0.0-19-generic #32-Ubuntu here. But probably more
relevant, the partition I tested on was ~1.7TB ext4 on an external USB
2.0
>> drive which was almost full and probably *very* fragmented, i.e. free
space spread all over the disk in thousands of small chunks. ext4 seems
to be pretty slow at allocating space in that case.
> But you asked to fallocate(10GB).
> That should have failed immediately,
> because it's bigger than the file system
> Could you run the fallocate loop from my previous mail,
> across an appropriate range of sizes, just to confirm,
> and maybe prepare input to a kernel bug report?

Unfortunately I have freed up a lot of space on the partition in question,
but I did some testing with a new small empty test partition, see below.


>> If I were designing a filesystem, I'd have it immediately return
failure
>> if fallocate() is specified with additional size larger than the amount of
>> free space. Though for the filesystem, determining how much extra space a
>> fallocate() call would need can be quite involved in some cases and
require a significant amount of disk access...
>> Imagine a huge sparse file with many thousands of holes, and the requested
>> region for fallocate() serving to "fill in" many of the holes. But any
non-hole parts within the fallocate() region would reduce the amount of
additional space required for fallocate() to succeed. So it's not as
simple as comparing length of fallocate() region with amount of free
space...
>> Unless you're creating a new file, which is what cp does most of the time.
>> So maybe a workaround could be added to cp. If --preallocate is specified,
>> cp could check the amount of free space before writing to the
>> destination
>> file and abort without even needing to call fallocate() if there isn't
enough. (In fact, cp could do that anyway in most cases I think?)
>
> Good analysis. Still though for a new file the file system
> should be able to to do the simple short calculation
> of fallocate_request - free_space > 0.

Yes.

From the filesystem's perspective it could easily immediately fail
fallocate() in some cases.

When fallocate() is called, the approximate space needed for success is
  (length of region passed to fallocate) - (amount already allocated to
file which overlaps the fallocate region).

The worst case is that the entire overlap between the fallocate region and
the file is a hole. Then
  space needed = fallocate region size.

However the filesystem could narrow it down a little. Consider the
difference between a file's apparent size and its on-disk size. Roughly
  size of all holes = apparent size - on-disk size

The space needed will not be more than:
  (size of fallocate region past end of file) + min((size of fallocate
region overlapping file), size of all holes)

So the fallocate() call could be failed without doing any work if there is
less free space than that.


> I can understand inefficiencies in fallocating
> around the free space limit, but otherwise
> this seems like a bug in ext4.
> (maybe a regression since I don't see it on my ext4 system).

I also saw slowness (i.e. a lot of disk I/O) when writing a file to that
ext4 partition without pre-allocation. It seemed like once the file got
above a certain size, there was a *lot* of disk I/O for several seconds.
Or maybe all the disk I/O happened when the kernel writeback kicked in.
Perhaps that's related to the fallocate() slowness, if the filesystem had
to seek all over the disk again and again to find free space.

As I mentioned though, that filesystem was/is *very* fragmented, with a
relatively small proportion of free space. I might not be able to
reproduce the issue now, because I shifted about 250GB of data off that
partition the other day.

Also worth noting, most ext4 partitions have a certain percentage reserved
for the root user. Maybe testing should be done either on a partition with
no reserved space, or as root.


I posted to the ext4 list a while ago about fallocate() behaviour. Not
specifically about this issue, but the fact that it's not atomic. If it
fails due to lack of space, it allocates all space on the partition with
no way to easily undo that.

For example, suppose the user has a very large sparse file with large
holes, maybe a virtual machine hard disk image or similar. User wants to
make the file non-sparse so calls fallocate() to allocate all the holes.
But it turns out there wasn't enough disk space for that to succeed. Or
maybe some other program allocated a lot of space in the mean time.
fallocate() fails after allocating all remaining space on the partition.
Other than deleting the file, the user would need to roll their own
hole-punching program to reclaim space.

You can read the thread about that at
  http://comments.gmane.org/gmane.comp.file-systems.ext4/29942


Back to the brief test I mentioned above. See the attached file. That
doesn't demonstrate any slowness (since the test was done on a very small
empty partition), but does demonstrate ext4 fallocate() not returning an
error on attempting to allocate more than the partition size. Instead it
allocates all remaining space then fails, leaving the user to manually fix
things afterwards.


Regards,
-- Mark
[ext4_fallocate_test.txt (text/plain, attachment)]

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 04:21:02 GMT) Full text and rfc822 format available.

Changed bug title to 'cp: use posix_fallocate where supported' from '[PATCH]: use posix_fallocate where supported' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 04:21:02 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 151 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.