GNU bug report logs - #10055
cp: cp -u corrupts 'fs'' information if interupted; can't

Previous Next

Package: coreutils;

Reported by: Linda Walsh <coreutils <at> tlinx.org>

Date: Tue, 15 Nov 2011 19:09:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 10055 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 19:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Linda Walsh <coreutils <at> tlinx.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 15 Nov 2011 19:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: bug-coreutils <at> gnu.org
Subject: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted;
	can't recover on future invoctions
Date: Tue, 15 Nov 2011 11:07:47 -0800





-------- Original Message --------
Subject: 	[sr #107875] BUG cp -u corrupts 'fs'' information if 
interupted; can't recover on future invoctions
Date: 	Tue, 15 Nov 2011 17:58:23 +0000
From: 	Linda A. Walsh <INVALID.NOREPLY <at> gnu.org>
To: 	Linda A. Walsh <>



URL:
 <http://savannah.gnu.org/support/?107875>

                Summary: BUG cp -u corrupts 'fs'' information if interupted;
can't recover on future invoctions
                Project: GNU Core Utilities
           Submitted by: law
           Submitted on: Tue Nov 15 09:58:22 2011
               Category: None
               Priority: 5 - Normal
               Severity: 3 - Normal
                 Status: None
                Privacy: Public
            Assigned to: None
       Originator Email: 
            Open/Closed: Open
        Discussion Lock: Any
       Operating System: None

   _______________________________________________________

Details:

This should be filed under bugs, not under support, but it seems that users of
the core utilis are ot allowed to find bugs...convenient.  No wonder quality
metrics worthless.

Not trying for a sensationalist summary, but you try coming up with a SHORT
accurate summary for this.

The problem is bad (in the sense of providing false assurance and not being
reliable), but not as bad as the summary might sound...

if you copy a bunch of files (or 1 file for that matter, but then it _might_
be more quickly noticed, and the copy is interrupted (most often control-C,
cuz some param was forgotten, but could be other causes),  a partial file with
the current time stamp is left in the target location and the corrupt copy is
not removed upon interruption, though it is marked as being "current"
(w/current DT stamp).

This creates a corrupt copy of the file in a collection of files that
subsequent cp -u won't correct.  This is a problem.

As there is no indication in a collection of how many files are corrupted in
this manner...and the sources may have long been deleted.  

If interrupted, the cp tool should remove any partials or ensure they are not
created to begin with.

Possible ways of addressing:
A) catch INT (& catchable signals), and remove any files that are
'incomplete'
Besides that, several other steps could be taken to provide increasing
protections (some are orthogonal, some dependent):
B) 1). open destination name for write (verifying accesses) w/
      Exclusive Write;
  2). open tmp file for actual cp operation.
  3). use posix_fallocate (if available) to allocate sufficient space for the
copy
  4). do the copy.
  5); rename tmp over original; (closing original before rename on systems
that don't support separation of names and FD's (Win systems et al).
C) reset DT stamps on newly opened files to '0' (~1969/70?)' in all
non-auto-updated fields; -- then start copy...  any future 
invokations of "cp -u could examine the time stamps, and if the
non-auto-updated fields appear to be zero; do the copy (and correct the time
stamps) with 2 possible exception conditions being noted:
     (a) if the source file also has '0'd time fields, then check file
sizes:
      if they match presume 'ok' (a statistical 'guess', -- possibly warned
about with a -verbose option), 
      if sizes don't match, presume not a correct update and do the copy.  
D) others?

As this is, it creates a situation of cp being unreliable.

Note, 'rsync' isn't a great substitute either, as I've ntoed
that when I was updating files with 'rsync', (which is always slower on full
file copies) with equivalent options, a later
usage of "cp -uav to copy the files recopied most of the files
(all? not sure)  that rsync had copied with -aUVHAX (supposedly the same info
as cp -au from my understanding)).

The same was not true for the reverse case (files cp'ed and updated by cp,
were not updated by rsync, -- leading me to suspect rsync as not only being
significantly slower, but not as thorough in copying over information).

FWIW, I feel it important to file bugs about tools that are currently the best
in their class...(and tend to devote my attentions to wanting to see them
enhanced, even beyond their original scope at times);  rsync used to have a
very basic feature which put it above cp, ... it copied extended attrs and
ACLS.  Now that cp does that, and that cp was about 2-3x faster
than rsync for full files...







   _______________________________________________________

Reply to this item at:

 <http://savannah.gnu.org/support/?107875>

_______________________________________________
 Message sent via/by Savannah
 http://savannah.gnu.org/






Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 19:37:02 GMT) Full text and rfc822 format available.

Message #8 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: "Linda A. Walsh" <law <at> tlinx.org>
To: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if	interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 11:21:39 -0800
Hmmm....   Dang strange processes on bugs...  can't submit directly bug 
can just by
emailing it to the email list?   ...  (bureaucracy!)

Linda Walsh wrote:
> This should be filed under bugs, not under support, but it seems that 
> users of
> the core utilis are ot allowed to find bugs...convenient.




Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 20:25:01 GMT) Full text and rfc822 format available.

Message #11 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Linda Walsh <coreutils <at> tlinx.org>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 12:23:57 -0800
Thanks for your thoughtful suggestions.
I like many of the ideas and hope that somebody can find the time
to code them up.  Here are some more-detailed comments.

On 11/15/11 11:07, Linda Walsh wrote:

>   3). use posix_fallocate (if available) to allocate sufficient space for the
> copy

This seems like a good idea, independently of the other points.
That is, if A and B are regular files, "cp A B" could
use A's size to preallocate B's storage, and it could
fail immediately (without trashing B!) if there's not
enough storage.  I like this.

> A) catch INT (& catchable signals), and remove any files that are
> 'incomplete'

That might cause trouble in other cases.  For example, "cp A B" where
B already exists.  In this case it's unwise to remove B if interrupted
-- people won't expect that.  And in general 'cp' has behaved the way
that it does for decades, and we need to be careful about changing its
default behavior in such a fairly-drastic way.

But we could add an option to 'cp' to have this behavior.
Perhaps --remove-destination=signal?  That is --remove-destination
could have an optional list of names of places where the destination
could be removed, where the default is not to remove it, and
plain --remove-destination means --remove-destination=before.

> B) 1). open destination name for write (verifying accesses) w/
>       Exclusive Write;

This could be another new option, though (as you write) it's
orthogonal to the main point.  I would suggest that this option be
called --oflag=excl (by analogy with dd's oflag= option).  We can add
support for the other output flags while we're at it, e.g.,
--oflag=excl,append,noatime.

>   2). open tmp file for actual cp operation.
>   5); rename tmp over original; (closing original before rename on systems
> that don't support separation of names and FD's (Win systems et al).

Yes, that could be another option.  I see (2) and (5) as being the
same feature.  Perhaps --remove-destination=after?

> C) reset DT stamps on newly opened files to '0' (~1969/70?)'

I dunno, this kind of time stamp munging sounds like it'd cause more
trouble than it'd cure.  It's more natural (and easier to debug
failures) if the last-modified time of a file is the time that the
file was last modified.





Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 20:48:01 GMT) Full text and rfc822 format available.

Message #14 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: "Linda A. Walsh" <law <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if
	interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 12:46:23 -0800

Paul Eggert wrote:


> 
>> A) catch INT (& catchable signals), and remove any files that are
>> 'incomplete'
> 
> That might cause trouble in other cases.  For example, "cp A B" where
> B already exists. 

===
	Am **only** suggesting this where 'B' has already been opened
and truncated by stuff being copied from 'A'...

	The point is to not leave a 'B' that is *indeterminate*.


In this case it's unwise to remove B if interrupted
> -- people won't expect that.  

--
	Better than leaving *doo doo* in a file where they expect
some.thing valid.

And in general 'cp' has behaved the way
> that it does for decades, and we need to be careful about changing its
> default behavior in such a fairly-drastic way.

----
	It's a bug...Fixing a bug isn't usually considered
drastic.

> 
> But we could add an option to 'cp' to have this behavior.
> Perhaps --remove-destination=signal?  That is --remove-destination
> could have an optional list of names of places where the destination
> could be removed, where the default is not to remove it, and
> plain --remove-destination means --remove-destination=before.

----
	I think you misunderstood the problem.






Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 21:09:02 GMT) Full text and rfc822 format available.

Message #17 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Linda A. Walsh" <law <at> tlinx.org>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 13:07:36 -0800
On 11/15/11 12:46, Linda A. Walsh wrote:

>     Better than leaving *doo doo* in a file

Sometimes, but not always.  I can think of plausible cases where I'd
rather have a partial copy than no copy at all.  As an extreme example,
if I'm doing 'cp /dev/tty A', I do not want A removed on interrupt
even if A has already been truncated and overwritten,
as A contains the only copy of the data that I just typed in by hand.

>> But we could add an option to 'cp' to have this behavior.
>> Perhaps --remove-destination=signal?  That is --remove-destination
>> could have an optional list of names of places where the destination
>> could be removed, where the default is not to remove it, and
>> plain --remove-destination means --remove-destination=before.
> 
> ----
>     I think you misunderstood the problem.

Perhaps I did.  But could you explain the problem then?  For example,
how would the proposed "cp --remove-destination=signal A B"
not address the problem?




Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 22:30:02 GMT) Full text and rfc822 format available.

Message #20 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: "Linda A. Walsh" <gnu <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if
	interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 14:18:14 -0800
[Message part 1 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Tue, 15 Nov 2011 22:33:02 GMT) Full text and rfc822 format available.

Message #23 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 10055 <at> debbugs.gnu.org, Linda Walsh <coreutils <at> tlinx.org>
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 22:31:11 +0000
On 11/15/2011 08:23 PM, Paul Eggert wrote:
> Thanks for your thoughtful suggestions.
> I like many of the ideas and hope that somebody can find the time
> to code them up.  Here are some more-detailed comments.
> 
> On 11/15/11 11:07, Linda Walsh wrote:
> 
>>   3). use posix_fallocate (if available) to allocate sufficient space for the
>> copy
> 
> This seems like a good idea, independently of the other points.
> That is, if A and B are regular files, "cp A B" could
> use A's size to preallocate B's storage, and it could
> fail immediately (without trashing B!) if there's not
> enough storage.  I like this.

I'll take a look at this at some stage.
I was intending to do it right after the fiemap stuff
as it was quite related, but that needed to be bypassed
for normal copies. Anyway I'll bump fallocate
up my priority list.

> 
>> A) catch INT (& catchable signals), and remove any files that are
>> 'incomplete'
> 
> That might cause trouble in other cases.  For example, "cp A B" where
> B already exists.  In this case it's unwise to remove B if interrupted
> -- people won't expect that.  And in general 'cp' has behaved the way
> that it does for decades, and we need to be careful about changing its
> default behavior in such a fairly-drastic way.
> 
> But we could add an option to 'cp' to have this behavior.
> Perhaps --remove-destination=signal?  That is --remove-destination
> could have an optional list of names of places where the destination
> could be removed, where the default is not to remove it, and
> plain --remove-destination means --remove-destination=before.
> 
>> B) 1). open destination name for write (verifying accesses) w/
>>       Exclusive Write;
> 
> This could be another new option, though (as you write) it's
> orthogonal to the main point.  I would suggest that this option be
> called --oflag=excl (by analogy with dd's oflag= option).  We can add
> support for the other output flags while we're at it, e.g.,
> --oflag=excl,append,noatime.
> 
>>   2). open tmp file for actual cp operation.
>>   5); rename tmp over original; (closing original before rename on systems
>> that don't support separation of names and FD's (Win systems et al).
> 
> Yes, that could be another option.  I see (2) and (5) as being the
> same feature.  Perhaps --remove-destination=after?

There are lots of implementation issues with tmp files,
many of which are noted here:
http://www.pixelbeat.org/docs/unix_file_replacement.html

> 
>> C) reset DT stamps on newly opened files to '0' (~1969/70?)'
> 
> I dunno, this kind of time stamp munging sounds like it'd cause more
> trouble than it'd cure.  It's more natural (and easier to debug
> failures) if the last-modified time of a file is the time that the
> file was last modified.

Not a bad idea and least invasive, but if the Ctrl-C happened
between the creat() and utime() you'd get a newer zero length file.
Then subsequent `cp -u` would have to treat zero length files specially.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Wed, 16 Nov 2011 04:38:02 GMT) Full text and rfc822 format available.

Message #26 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: "Linda A. Walsh" <gnu <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if
	interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 19:33:04 -0800
[Message part 1 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Wed, 16 Nov 2011 06:10:02 GMT) Full text and rfc822 format available.

Message #29 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Linda A. Walsh" <gnu <at> tlinx.org>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if interupted; can't recover on future invoctions
Date: Tue, 15 Nov 2011 22:08:31 -0800
On 11/15/11 19:33, Linda A. Walsh wrote:
> Why don't we
> focus on the specific problem mentioned which was using it in the context of
> the "-u" flag, (and with -a/-r and/or a wildcard), where you expect it to update
> contents of 'Dst' with 'Src'.

I'd rather not have a heuristic that says "cp removes the destination
when interrupted, if you use the -u flag with -a or -r or a wildcard".
That'd be a hard rule to remember, and it's probably not the "best"
rule anyway, for somebody's opinion of "best".  We need a simple rule
that's easy to document and to remember, even if it isn't necessarily
the "best" by some other measure.

It'd be OK if "cp -a" implies the new --remove-destination=signal
(or whatever) option.  Then you could just use "cp -a".

> cp could check file sizes and see
> if the target is smaller and if so.. assume, if the DT's were equal that the file cp was
> interrupted...and finish it...

I'm still not convinced by the idea about trusting the time stamp on
the destination.  Every time 'cp' writes to its destination, it will
update the destination's time stamp.  Sure, 'cp' can use utime immediately
afterwards to alter the time stamp, but there's still a window where
the destination's time stamp will be 'now'.  In general 'cp' must
continue to work in that case -- so why should it bother to reset the
destination's time stamp after every write?




Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Wed, 16 Nov 2011 07:24:02 GMT) Full text and rfc822 format available.

Message #32 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Linda A. Walsh" <law <at> tlinx.org>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if	interupted; can't recover on future invoctions
Date: Wed, 16 Nov 2011 08:22:05 +0100
Linda A. Walsh wrote:
> Hmmm....  Dang strange processes on bugs...  can't submit directly bug
> can just by
> emailing it to the email list?   ...  (bureaucracy!)
>
> Linda Walsh wrote:
>> This should be filed under bugs, not under support, but it seems that users of
>> the core utilis are ot allowed to find bugs...convenient.

Thanks for the report.

Please do not use savannah's bug or support interfaces for coreutils.
We deliberately disabled the former.
Now, when you send a message to the bug-coreutils mailing list,
it creates a ticket for you.  Yours is here:

    http://bugs.gnu.org/10055

Simply replying to any mail about it adds entries to its log.




Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Wed, 16 Nov 2011 14:06:01 GMT) Full text and rfc822 format available.

Message #35 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: "Linda A. Walsh" <law <at> tlinx.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if	interupted; can't recover on future invoctions
Date: Wed, 16 Nov 2011 06:04:14 -0800
[Message part 1 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#10055; Package coreutils. (Wed, 16 Nov 2011 14:17:02 GMT) Full text and rfc822 format available.

Message #38 received at 10055 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Linda A. Walsh" <law <at> tlinx.org>
Cc: 10055 <at> debbugs.gnu.org
Subject: Re: bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information
	if	interupted; can't recover on future invoctions
Date: Wed, 16 Nov 2011 15:15:17 +0100
Linda A. Walsh wrote:
...
>    But that's not the bug db interface...thats just a log...where? the bug
>    db intface for the bug in the bug database?
>
> References
>
>    1. http://bugs.gnu.org/10055

Here's a description of the interface:

  http://debbugs.gnu.org/




Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 14:49:01 GMT) Full text and rfc822 format available.

Changed bug title to 'cp: cp -u corrupts 'fs'' information if interupted; can't' from '[sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 14:49:01 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 187 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.