GNU bug report logs - #62404
--reflink=auto not falling back appropriately to standard copy in all cases

Previous Next

Package: coreutils;

Reported by: Pádraig Brady <P <at> draigBrady.com>

Date: Thu, 23 Mar 2023 13:04:01 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 62404 in the body.
You can then email your comments to 62404 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Thu, 23 Mar 2023 13:04:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pádraig Brady <P <at> draigBrady.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 23 Mar 2023 13:04:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Report bugs to <bug-coreutils <at> gnu.org>
Subject: --reflink=auto not falling back appropriately on older kernels
Date: Thu, 23 Mar 2023 13:03:06 +0000
Details at
https://github.com/termux/termux-packages/issues/15706#issuecomment-1481144831

But in summary, the new --reflink errno checking in coreutils 9.2
is not working appropriately on android 4.9 kernels at least.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Thu, 23 Mar 2023 13:40:01 GMT) Full text and rfc822 format available.

Message #8 received at 62404 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 62404 <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Thu, 23 Mar 2023 13:39:01 +0000
[Message part 1 (text/plain, inline)]
On 23/03/2023 13:03, Pádraig Brady wrote:
> Details at
> https://github.com/termux/termux-packages/issues/15706#issuecomment-1481144831
> 
> But in summary, the new --reflink errno checking in coreutils 9.2
> is not working appropriately on android 4.9 kernels at least.

Proposed patch attached.
It may be better to key on kernel version rather than __ANDROID__
(along the lines of https://github.com/coreutils/coreutils/commit/ba5e6885d),
but that needs more investigation.

cheers,
Pádraig
[copy-clone-android.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Thu, 23 Mar 2023 14:56:01 GMT) Full text and rfc822 format available.

Message #11 received at 62404 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 62404 <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Thu, 23 Mar 2023 14:55:06 +0000
[Message part 1 (text/plain, inline)]
On 23/03/2023 13:39, Pádraig Brady wrote:
> On 23/03/2023 13:03, Pádraig Brady wrote:
>> Details at
>> https://github.com/termux/termux-packages/issues/15706#issuecomment-1481144831
>>
>> But in summary, the new --reflink errno checking in coreutils 9.2
>> is not working appropriately on android 4.9 kernels at least.
> 
> Proposed patch attached.
> It may be better to key on kernel version rather than __ANDROID__
> (along the lines of https://github.com/coreutils/coreutils/commit/ba5e6885d),
> but that needs more investigation.

Actually it seems the errno checking may be too brittle in general,
https://www.reddit.com/r/Proxmox/comments/11zieha/cpmvinstall_are_broken_in_unprivileged_containers/

Perhaps we should use the above __ANDROID__ logic in all cases,
so that we fallback unless there is a specific reason not to.

The attached does that.

cheers,
Pádraig
[copy-clone-fallback.patch (text/x-patch, attachment)]

Changed bug title to '--reflink=auto not falling back appropriately to standard copy in all cases' from '--reflink=auto not falling back appropriately on older kernels' Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Thu, 23 Mar 2023 17:31:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Thu, 23 Mar 2023 23:06:02 GMT) Full text and rfc822 format available.

Message #16 received at 62404 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 62404 <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Thu, 23 Mar 2023 15:57:16 -0700
Thanks for looking into this.

On 3/23/23 07:55, Pádraig Brady wrote:

> Perhaps we should use the above __ANDROID__ logic in all cases,
> so that we fallback unless there is a specific reason not to.

Yes, that sounds better.

> +/* Whether the errno indicates operation is a transient failure.
> +   I.e., a failure that would indicate the operation _is_ supported,
> +   but has failed in a terminal way.  */

I got confused by the terminology here, as "transient failure" has the 
opposite connotation from terminal failure. I suggest that we remove the 
function (since it's used only once and its name is confusing) and 
replace its calling code as suggested below.


> -            if (errno == EPERM && *total_n_read == 0)
> +            if (is_CLONENOTSUP (errno, *total_n_read))

The *total_n_read should be *total_n_read != 0 partly for style and 
partly to be portable to ancient compilers that don't properly support 
bool. More important, I suggest leaving is_CLONENOTSUP's API as-is, and 
changing the 'if' to:

   if (*total_n_read == 0 && is_CLONENOTSUP (errno))

as I don't see why we should treat EPERM differently from ENOSYS etc.


>  /* Whether the errno from FICLONE, or copy_file_range
>     indicates operation is not supported for this file or file system.  */
>  
>  static bool
> -is_CLONENOTSUP (int err)
> +is_CLONENOTSUP (int err, bool partial)

Since the code no longer calls is_CLONENOTSUP when FICLONE fails, the 
comment should be changed to not mention FICLONE.


> +  /* If the clone operation is not creating the destination (i.e., FICLONE)
> +     then we could possibly be more restrictive with errors deemed non terminal.
> +     However doing that like in the following
> +     would be more coupled to disparate errno handling on various systems.
> +     if (0 <= dest_desc)
> +       transient_failure = ! is_CLONENOTSUP (errno, false);  */
> +  bool transient_failure = is_transient_failure (errno);

I had trouble with those two "transient_failure"s lining up, plus the 
terminology thing mentioned above.  I suggest changing this to:

  /* When FICLONE fails, report failure only with errno values known
     to mean trouble when FICLONE is supported and called properly.
     Do do not report failure merely because !is_CLONENOTSUP (errno),
     as too many systems yield oddball errno values here.  */
  bool report_failure = (errno == EIO || errno == ENOMEM
			 || errno == ENOSPC || errno == EDQUOT);

and then replace all occurrences of "transient_failure" with 
"report_failure".




Information forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Thu, 23 Mar 2023 23:15:02 GMT) Full text and rfc822 format available.

Message #19 received at 62404 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 62404 <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Thu, 23 Mar 2023 23:06:50 +0000
On 23/03/2023 22:57, Paul Eggert wrote:
> Thanks for looking into this.
> 
> On 3/23/23 07:55, Pádraig Brady wrote:
> 
>> Perhaps we should use the above __ANDROID__ logic in all cases,
>> so that we fallback unless there is a specific reason not to.
> 
> Yes, that sounds better.
> 
>> +/* Whether the errno indicates operation is a transient failure.
>> +   I.e., a failure that would indicate the operation _is_ supported,
>> +   but has failed in a terminal way.  */
> 
> I got confused by the terminology here, as "transient failure" has the
> opposite connotation from terminal failure. I suggest that we remove the
> function (since it's used only once and its name is confusing) and
> replace its calling code as suggested below.
> 
> 
>> -            if (errno == EPERM && *total_n_read == 0)
>> +            if (is_CLONENOTSUP (errno, *total_n_read))
> 
> The *total_n_read should be *total_n_read != 0 partly for style and
> partly to be portable to ancient compilers that don't properly support
> bool. More important, I suggest leaving is_CLONENOTSUP's API as-is, and
> changing the 'if' to:
> 
>      if (*total_n_read == 0 && is_CLONENOTSUP (errno))
> 
> as I don't see why we should treat EPERM differently from ENOSYS etc.
> 
> 
>>   /* Whether the errno from FICLONE, or copy_file_range
>>      indicates operation is not supported for this file or file system.  */
>>   
>>   static bool
>> -is_CLONENOTSUP (int err)
>> +is_CLONENOTSUP (int err, bool partial)
> 
> Since the code no longer calls is_CLONENOTSUP when FICLONE fails, the
> comment should be changed to not mention FICLONE.
> 
> 
>> +  /* If the clone operation is not creating the destination (i.e., FICLONE)
>> +     then we could possibly be more restrictive with errors deemed non terminal.
>> +     However doing that like in the following
>> +     would be more coupled to disparate errno handling on various systems.
>> +     if (0 <= dest_desc)
>> +       transient_failure = ! is_CLONENOTSUP (errno, false);  */
>> +  bool transient_failure = is_transient_failure (errno);
> 
> I had trouble with those two "transient_failure"s lining up, plus the
> terminology thing mentioned above.  I suggest changing this to:
> 
>     /* When FICLONE fails, report failure only with errno values known
>        to mean trouble when FICLONE is supported and called properly.
>        Do do not report failure merely because !is_CLONENOTSUP (errno),
>        as too many systems yield oddball errno values here.  */
>     bool report_failure = (errno == EIO || errno == ENOMEM
> 			 || errno == ENOSPC || errno == EDQUOT);
> 
> and then replace all occurrences of "transient_failure" with
> "report_failure".

Thanks for the review.
I'll adjust as suggested.
The is_CLONENOTSUP() change _was_ a little contrived
since it wasn't actually needed in the new implementation.
It would only be needed if we ever wanted to change back
to the more aggressive errno checks after FICLONE,
in which case the is_CLONENOTSUP() would then be correct.
I'll leave it as is and comment for now.

thanks,
Pádraig




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Fri, 24 Mar 2023 13:22:02 GMT) Full text and rfc822 format available.

Notification sent to Pádraig Brady <P <at> draigBrady.com>:
bug acknowledged by developer. (Fri, 24 Mar 2023 13:22:02 GMT) Full text and rfc822 format available.

Message #24 received at 62404-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 62404-done <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Fri, 24 Mar 2023 13:20:40 +0000
The fix for this is now pushed at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=093a8b4bf

marking this as done.

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#62404; Package coreutils. (Sat, 25 Mar 2023 20:35:01 GMT) Full text and rfc822 format available.

Message #27 received at 62404 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 62404 <at> debbugs.gnu.org
Subject: Re: bug#62404: --reflink=auto not falling back appropriately on older
 kernels
Date: Sat, 25 Mar 2023 13:34:40 -0700
[Message part 1 (text/plain, inline)]
Thanks for installing that. I found the comments still a bit confusing 
so I pushed the attached; hope it's OK.

It is annoying that in the common case where A is a regular file and B 
does not exist but will be created on the same file system, the syscalls 
start out:

  openat(AT_FDCWD, "B", O_RDONLY|O_PATH|O_DIRECTORY) = -1 ENOENT
  newfstatat(AT_FDCWD, "A", ...}, 0) = 0
  openat(AT_FDCWD, "A", O_RDONLY)        = 3
  newfstatat(3, "", ..., AT_EMPTY_PATH) = 0
  openat(AT_FDCWD, "B", O_WRONLY|O_CREAT|O_EXCL, 0775) = 4
  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP
  newfstatat(4, "", ..., AT_EMPTY_PATH) = 0

They should be just:

  openat(AT_FDCWD, "A", O_RDONLY)        = 3
  newfstatat(3, "", ..., AT_EMPTY_PATH) = 0
  openat(AT_FDCWD, "B", O_WRONLY|O_CREAT|O_EXCL, 0775) = 4
  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP
  newfstatat(4, "", ..., AT_EMPTY_PATH) = 0

as there should be no need to stat the source twice, or to try to open 
the destination twice. But this is a performance improvement for another 
day.
[0001-cp-clarify-commentary.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 23 Apr 2023 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 4 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.