GNU bug report logs - #22768
Crash safety

Previous Next

Package: gzip;

Reported by: Yanyan Jiang <jiangyy <at> outlook.com>

Date: Mon, 22 Feb 2016 16:02:02 UTC

Severity: normal

Merged with 22770

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22768 in the body.
You can then email your comments to 22768 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Mon, 22 Feb 2016 16:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Yanyan Jiang <jiangyy <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Mon, 22 Feb 2016 16:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yanyan Jiang <jiangyy <at> outlook.com>
To: bug-gzip <at> gnu.org
Subject: Crash safety
Date: Mon, 22 Feb 2016 03:57:47 -0500
[Message part 1 (text/plain, inline)]
Hi gzip developers,

Gzip version: 1.6

I am developing a tool to validate crash safety of application software. I have just found that the file deletion has a potential safety venerability: if only a prefix of I/O operations are flushed to disk, after reboot, the file-system would only contain a 0-byte file (the data is not reached to disk yet).

A paper FYI: http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/17780-osdi14-paper-pillai.pdf <http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/17780-osdi14-paper-pillai.pdf> (Table 1 on Page 440). Data append can be (virtually) reordered with any operation at default ext3 and ext4 settings. I recommend to use fsync() to persist the .gz file before deletion.

— strace log —

 36 open("a", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = 3
 37 fstat64(3, {st_mode=S_IFREG|0664, st_size=19730, ...}) = 0
 38 rt_sigprocmask(SIG_BLOCK, [HUP INT PIPE TERM XCPU XFSZ], [], 8) = 0
 39 open("a.gz", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
 40 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
 41 read(3, "10017649652034232324895361757801"..., 65536) = 19730
 42 read(3, "", 45806)                      = 0
 43 write(4, "\37\213\10\10\24\312\312V\0\3a\0-\334m\226\234\274\22\3\340\377Y\r    \30l\354\375o,z\324"..., 9954) = 9954
 44 close(3)                                = 0
 45 utimensat(4, NULL, {{1456130580, 76955623}, {1456130580, 128955620}}, 0) = 0
 46 fchown32(4, 1000, 1000)                 = 0
 47 fchmod(4, 0664)                         = 0
 48 close(4)                                = 0
 49 rt_sigprocmask(SIG_BLOCK, [HUP INT PIPE TERM XCPU XFSZ], [], 8) = 0
 50 unlink("a")                             = 0

Thank you for your attention!

Regards,
Yanyan Jiang 蒋炎岩
Institute of Computer Software,
Dept. of Computer Science, Nanjing University

[Message part 2 (text/html, inline)]

Merged 22768 22770. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Tue, 23 Feb 2016 07:17:01 GMT) Full text and rfc822 format available.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 23 Feb 2016 07:29:02 GMT) Full text and rfc822 format available.

Notification sent to Yanyan Jiang <jiangyy <at> outlook.com>:
bug acknowledged by developer. (Tue, 23 Feb 2016 07:29:02 GMT) Full text and rfc822 format available.

Message #12 received at 22768-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Yanyan Jiang <jiangyy <at> outlook.com>, 22768-done <at> debbugs.gnu.org
Subject: Re: bug#22768: Crash safety
Date: Mon, 22 Feb 2016 23:28:07 -0800
[Message part 1 (text/plain, inline)]
Thanks for reporting the problem. It's annoying that gzip must invoke fsync, as 
that's way overkill compared to the write-ordering that is needed and fsync will 
slow gzip down, but I don't see any safe and reasonably portable alternative so 
I installed the attached patch on Savannah, here:

http://git.savannah.gnu.org/cgit/gzip.git/commit/?id=22aac8f8a616a72dbbe0e4119db8ddda0f076c04
[0001-fsync-output-file-before-closing.txt (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 23 Feb 2016 07:29:02 GMT) Full text and rfc822 format available.

Notification sent to Yanyan Jiang <jiangyy <at> outlook.com>:
bug acknowledged by developer. (Tue, 23 Feb 2016 07:29:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Fri, 26 Feb 2016 11:30:02 GMT) Full text and rfc822 format available.

Message #20 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org, Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Fri, 26 Feb 2016 12:34:32 +0100
Paul Eggert wrote:
> Thanks for reporting the problem. It's annoying that gzip must invoke 
> fsync, as that's way overkill compared to the write-ordering that is 
> needed and fsync will slow gzip down, but I don't see any safe and 
> reasonably portable alternative

I 100% agree.

I am considering a different approach for lzip; adding a new option, say 
'-y, --fsync', to call fsync when acting in-place. This provides safety 
to those needing it without slowing down the work of everybody else. 
Especially of those (de)compressing a lot of replaceable files in-place.

(Ddrescue provides the option '-y, --synchronous' because calling fsync 
by default is too slow).

What do you think?




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Fri, 26 Feb 2016 11:54:02 GMT) Full text and rfc822 format available.

Message #23 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>, 22768 <at> debbugs.gnu.org,
 Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Fri, 26 Feb 2016 03:53:04 -0800
Antonio Diaz Diaz wrote:
> I am considering a different approach for lzip; adding a new option, say '-y,
> --fsync', to call fsync when acting in-place. This provides safety to those
> needing it without slowing down the work of everybody else. Especially of those
> (de)compressing a lot of replaceable files in-place.

Yes, I considered an --fsync option as well, but worried about making the 
default unsafe. How about if gzip and lzip instead add a --no-fsync option for 
people who don't need the safety?




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Fri, 26 Feb 2016 12:30:02 GMT) Full text and rfc822 format available.

Message #26 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Fri, 26 Feb 2016 13:34:59 +0100
Paul Eggert wrote:
> Yes, I considered an --fsync option as well, but worried about making 
> the default unsafe. How about if gzip and lzip instead add a --no-fsync 
> option for people who don't need the safety?

I think it is a good idea. The people who know that don't need the 
safety are probably the most inclined to read the documentation.

Is it ok to use '-y' as short option name?





Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Fri, 26 Feb 2016 21:29:01 GMT) Full text and rfc822 format available.

Message #29 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>
Cc: Yanyan Jiang <jiangyy <at> outlook.com>, 22768 <at> debbugs.gnu.org
Subject: Re: bug#22768: Crash safety
Date: Fri, 26 Feb 2016 13:28:17 -0800
On 02/26/2016 04:34 AM, Antonio Diaz Diaz wrote:
> Is it ok to use '-y' as short option name? 

Why -y?  And why a short name at all?  I don't expect this to be 
something that people will want to type by hand.

Come to think of it, gzip should also do an fsync, or at least an 
fdatasync, on the destination's directory before removing the source. 
And this suggests that any long option name shouldn't be something 
syscall-specific like '--no-fsync', but should instead be something more 
general and easy to remember, e.g., '--hasty'.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sat, 27 Feb 2016 00:13:01 GMT) Full text and rfc822 format available.

Message #32 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Sat, 27 Feb 2016 01:17:25 +0100
Paul Eggert wrote:
> Why -y?  And why a short name at all?  I don't expect this to be 
> something that people will want to type by hand.

Because ddrescue already provides the option '-y, --synchronous' for a 
somewhat similar functionality. (It is in my first message). I find 
short option names handy, but I have no problem implementing this as 
long-only.


> Come to think of it, gzip should also do an fsync, or at least an 
> fdatasync, on the destination's directory before removing the source. 

Doing it right in all circumstances may be impossible. Also fsync may be 
a no-op in some systems and very expensive in others. And some systems 
are safe without the need of fsync. This is why my first idea was to 
leave the safety to the system by default and add an option to enable 
the "maybe safer but slower" behavior.


> And this suggests that any long option name shouldn't be something 
> syscall-specific like '--no-fsync', but should instead be something more 
> general and easy to remember, e.g., '--hasty'.

I had to search 'hasty' in the dictionary, so I think it is perhaps not 
so good and easy to remember for non-English speakers. OTOH, many people 
using a CLI know what 'fsync' or 'sync' mean.

Just now my preference is to make the behavior optional and call the 
option --fsync. I think both points meet the principle of least surprise.


Best regards,
Antonio.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sat, 27 Feb 2016 00:46:01 GMT) Full text and rfc822 format available.

Message #35 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Sat, 27 Feb 2016 01:50:54 +0100
Antonio Diaz Diaz wrote:
> Just now my preference is to make the behavior optional and call the 
> option --fsync. I think both points meet the principle of least surprise.

An additional reason to make the behavior optional is that people find 
the performance penalty of fsync so annoying that they even write 
libraries to disable it[1].

"This package contains a small LD_PRELOAD library (libeatmydata) and a 
couple of helper utilities designed to transparently disable fsync and 
friends (like open(O_SYNC)). This has two side-effects: making software 
that writes data safely to disk a lot quicker and making this software 
no longer crash safe."

[1] http://packages.debian.org/testing/utils/eatmydata


Best regards,
Antonio.





Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sat, 27 Feb 2016 00:48:01 GMT) Full text and rfc822 format available.

Message #38 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>, Antonio Diaz Diaz <antonio <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#22768: Crash safety
Date: Fri, 26 Feb 2016 17:46:58 -0700
Antonio Diaz Diaz wrote:
> Paul Eggert wrote:
> >And this suggests that any long option name shouldn't be something
> >syscall-specific like '--no-fsync', but should instead be something more
> >general and easy to remember, e.g., '--hasty'.
> 
> I had to search 'hasty' in the dictionary, so I think it is perhaps not so
> good and easy to remember for non-English speakers. OTOH, many people using
> a CLI know what 'fsync' or 'sync' mean.

Worse is that one program chooses "hasty".  Another chooses "quick".
Another "hurried", "fast", "rapid", "swift".  It becomes impossible to
remember what each program uses.  Keeping to what it does seems least
surprising and in this case it is either --fsync or --sync.

> Just now my preference is to make the behavior optional and call the option
> --fsync. I think both points meet the principle of least surprise.

I would much prefer the above of an option to enable it rather than
one to disable it.  Otherwise I have to go through workarounds to
avoid it in order to have the performance that used to be the default.

It has been many decades with the cached behavior and apparently
without significant issues due to it.  Large changes should be made
slowly as an option rather than abruptly as the default.

Bob




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sat, 27 Feb 2016 15:42:02 GMT) Full text and rfc822 format available.

Message #41 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
Cc: Yanyan Jiang <jiangyy <at> outlook.com>, 22768 <at> debbugs.gnu.org
Subject: Re: bug#22768: Crash safety
Date: Sat, 27 Feb 2016 16:47:06 +0100
Bob Proulx wrote:
>> Just now my preference is to make the behavior optional and call the option
>> --fsync. I think both points meet the principle of least surprise.
> 
> I would much prefer the above of an option to enable it rather than
> one to disable it.  Otherwise I have to go through workarounds to
> avoid it in order to have the performance that used to be the default.

After thinking about it, I think that the right thing to do is to not 
implement any kind of fsync functionality in gzip/lzip.

First, it may be a cause of feature creep. If gzip fsyncs the output 
file it might also test it, or even compare it with the input file, 
before deleting the input file.

Second, as doing it right in all circumstances may be impossible, it may 
become an endless source of bug reports. (fsyncing also the 
destination's directory, opening the output with O_DIRECT,...).

Third, it fights against other layers of the system, like the 
filesystem, instead of collaborating with them.

Fourth, it fights against user's wishes instead of obeying them. If the 
user chooses a fast-but-unsafe configuration for the filesystem, gzip 
should not try to circumvent the user's choice, because gzip does not 
know if the file being compressed is worth the trouble or not.

I think that the best way of guarding an important file against all bugs 
and crashes is a extended version of the procedure already documented in 
the manual of lzip:

1) gzip --keep file	# don't delete input
2) sync			# commit output and directory to disk
3) zcmp file file.gz	# verify output
4) rm file		# then remove input


Best regards,
Antonio.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sun, 28 Feb 2016 08:28:02 GMT) Full text and rfc822 format available.

Message #44 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>, 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Sun, 28 Feb 2016 00:26:58 -0800
[Message part 1 (text/plain, inline)]
Antonio Diaz Diaz wrote:

> ddrescue already provides the option '-y, --synchronous' for a somewhat
> similar functionality.

OK, let's do it as --synchronous, long-only.  If the need keeps growing we can 
add -y.

> Just now my preference is to make the behavior optional

On second thought, as Bob Proulx suggested, this is a better approach.  I tried 
the a synchronous gzip on a contrived example (compressing 1000 empty files on 
an ext4 file system on an actual hard drive with options relatime, seclabel, 
data=ordered) and synchronizing made gzip 700x slower.  Most people will prefer 
the old behavior, where gzip is faster and is unsafe mostly just in theory.

I'm attaching the patches I installed recently in this area, to help fix this 
problem.  I'll follow up on your other recent email in another message soon.
[0001-gzip-fdatasync-output-dir-before-unlinking.patch (text/x-diff, attachment)]
[0002-gzip-use-constants-not-fileno.patch (text/x-diff, attachment)]
[0003-gzip-new-option-synchronous.patch (text/x-diff, attachment)]
[0004-misc-update-version-copyright.patch (text/x-diff, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Sun, 28 Feb 2016 08:32:02 GMT) Full text and rfc822 format available.

Message #47 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>
Cc: Yanyan Jiang <jiangyy <at> outlook.com>, 22768 <at> debbugs.gnu.org
Subject: Re: bug#22768: Crash safety
Date: Sun, 28 Feb 2016 00:30:59 -0800
Antonio Diaz Diaz wrote:

> it may be a cause of feature creep. If gzip fsyncs the output file it
> might also test it, or even compare it with the input file, before deleting the
> input file.

Feature creep is something we should avoid. Here, though, it's a real pain to 
synchronize correctly and many people will get it wrong. (See my commentary at 
the end of this email for one example of getting it wrong.) By comparison, 
comparing the decompressed output with the input file is something that most 
people will probably get right, so it's less useful to add a gzip option for that.

> Second, as doing it right in all circumstances may be impossible

Sure, as some file systems do not support fsync. Still, gzip should do what it can.

> it may become an endless source of bug reports.

I doubt it.  gzip has run unsafely for decades, and this is the first bug report 
about it -- one discovered by code inspection, not by actual failure.

> (fsyncing also the destination's directory,

Yes, that needs fixing.  Done in the patches I just now emailed to you.

> opening the output with O_DIRECT,...).

I doubt whether that feature will be needed or useful for gzip.

> Third, it fights against other layers of the system, like the filesystem,
> instead of collaborating with them.

True, fsync is a bad design. But that is no excuse for gzip losing data.

> Fourth, it fights against user's wishes instead of obeying them.

This should not be a problem if --synchronous is a new option, defaulting to the 
old (unsynchronized) behavior.

> I think that the best way of guarding an important file against all bugs and
> crashes is a extended version of the procedure already documented in the manual
> of lzip:
>
> 1) gzip --keep file    # don't delete input
> 2) sync            # commit output and directory to disk
> 3) zcmp file file.gz    # verify output
> 4) rm file        # then remove input

That approach does not suffice, because 'sync' does not guarantee that the 
output data has been synchronized to disk. See:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/sync.html

With GNU 'sync' there is a workaround, but it is not portable to non-GNU 
systems; besides, the workaround is not obvious.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Mon, 29 Feb 2016 17:08:01 GMT) Full text and rfc822 format available.

Message #50 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Mon, 29 Feb 2016 18:12:51 +0100
Paul Eggert wrote:
> Feature creep is something we should avoid. Here, though, it's a real 
> pain to synchronize correctly and many people will get it wrong.

The problem is that it is impossible to get it right unless one does 
something as extreme as unmounting-then-remounting the filesystem, or 
even making a backup copy on a removable device and then verifying the 
copy on a different computer.


> Sure, as some file systems do not support fsync. Still, gzip should do 
> what it can.

I am not so sure. There seems to be an arms race between tools that do 
what they can and ways of preventing those tools from doing it. Just 
search for "disable fsync".

Even if invoked optionally, all this complication to perhaps achieve 
nothing but a false sense of safety goes against my KISS philosophy.

Imagine if some backup tool begins calling 'gzip --synchronous', and 
users are forced to install libeatmydata to disable it.


>> it may become an endless source of bug reports.
> 
> I doubt it.  gzip has run unsafely for decades, and this is the first 
> bug report about it -- one discovered by code inspection, not by actual 
> failure.

Publish or perish may be the cause of such an endless source of bug 
reports. The next one might be titled "On why 'gzip --synchronous' does 
not work on some filesystems".


>> (fsyncing also the destination's directory,
> 
> Yes, that needs fixing.  Done in the patches I just now emailed to you.

Thanks. But I was not asking for a fix. Just pointing out what others 
might ask.


> True, fsync is a bad design. But that is no excuse for gzip losing data.

As I see it, it is not gzip the one losing the data, but the filesystem 
that does not respect the write order, or even the user that chose such 
filesystem (perhaps because of a good reason).


>> 1) gzip --keep file     # don't delete input
>> 2) sync                 # commit output and directory to disk
>> 3) zcmp file file.gz    # verify output
>> 4) rm file              # then remove input
> 
> That approach does not suffice, because 'sync' does not guarantee that 
> the output data has been synchronized to disk.

I know, but how can you guarantee that 'gzip --synchronous' will work on 
a system where the 'sync' above does not even guarantee that 'file.gz' 
is written to disk before 'file' is deleted?

I still think that the right thing to do is to not implement any kind of 
fsync functionality in gzip/lzip, and achieve permanence (when it is 
needed) by some other means. As you said, gzip has run unsafely for 
decades without a failure.


Best regards,
Antonio.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Mon, 29 Feb 2016 19:44:02 GMT) Full text and rfc822 format available.

Message #53 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>, 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Mon, 29 Feb 2016 11:43:41 -0800
On 02/29/2016 09:12 AM, Antonio Diaz Diaz wrote:
>>> 1) gzip --keep file     # don't delete input
>>> 2) sync                 # commit output and directory to disk
>>> 3) zcmp file file.gz    # verify output
>>> 4) rm file              # then remove input
>>
>> That approach does not suffice, because 'sync' does not guarantee 
>> that the output data has been synchronized to disk.
>
> I know, but how can you guarantee that 'gzip --synchronous' will work 
> on a system where the 'sync' above does not even guarantee that 
> 'file.gz' is written to disk before 'file' is deleted?

Yes, I can guarantee that 'gzip --synchronous' will not lose data on any 
system conforming to POSIX with the Synchronized Input and Output 
option.  No such guarantee can be made for the above shell script, 
because the 'sync' command does not make the same guarantees that the 
'fsync' function does.  Putting the above shell script into the 
documentation would give users a false sense of security. (Or maybe we 
should put the above shell script into the documentation as an example 
of what *not* to do. :-)

> The next one might be titled "On why 'gzip --synchronous' does not 
> work on some filesystems". 

:-)  Of course the problem can still exist on file systems that do not 
conform to POSIX, and there are many of those. Still, there are people 
who take these things seriously, and who use file systems that are safe 
in the presence of crashes, and for these people grep --synchronous 
should work.

> As you said, gzip has run unsafely for decades without a failure. 

I did not say that!  And I am skeptical that it's true.  I think it's 
quite possible that gzip has lost data when an operating system crashed 
at the wrong moment.




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Tue, 01 Mar 2016 18:08:01 GMT) Full text and rfc822 format available.

Message #56 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Tue, 01 Mar 2016 19:13:15 +0100
Paul Eggert wrote:
>> I know, but how can you guarantee that 'gzip --synchronous' will work 
>> on a system where the 'sync' above does not even guarantee that 
>> 'file.gz' is written to disk before 'file' is deleted?
> 
> Yes, I can guarantee that 'gzip --synchronous' will not lose data on any 
> system conforming to POSIX with the Synchronized Input and Output 
> option.

Unless someone has somehow disabled fsync, I guess. :-)

I am not questioning you. You know very well what you do. It is simply 
that I find the situation so chaotic that I think maybe better methods 
to ensure data permanence have yet to be developed.


> Still, there are people who take these things seriously, and who use
> file systems that are safe in the presence of crashes, and for these
> people grep --synchronous should work.

What I ask myself is, are those people better served by adding 
--synchronous options to every tool, or by using a crash-tolerant file 
system[1]?

"At the ACM Symposium on Operating Systems Principles in October, MIT 
researchers will present the first file system that is mathematically 
guaranteed not to lose track of data during crashes."

[1] http://news.mit.edu/2015/crash-tolerant-data-storage-0824


>> As you said, gzip has run unsafely for decades without a failure. 
> 
> I did not say that!

Sorry, I meant "without a reported failure".

My point is that if gzip has run unsafely for decades without a reported 
failure, maybe all those who take these things seriously are already 
using file systems safe enough to guarantee that well behaved tools like 
gzip do not lose data.


Best regards,
Antonio.

(No need to CC me. I am subscribed to bug-gzip).




Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Tue, 01 Mar 2016 20:58:02 GMT) Full text and rfc822 format available.

Message #59 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>, 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Tue, 1 Mar 2016 12:57:05 -0800
On 03/01/2016 10:13 AM, Antonio Diaz Diaz wrote:
> [1] http://news.mit.edu/2015/crash-tolerant-data-storage-0824

FSCQ is not even close to ready for prime-time, I'm afraid. Its 
prototype is slow compared to conventional file systems (it assumes a 
single-threaded kernel, it issues many more writes than ext4 does to 
implement a commit, its has been publicly tested only on flash drives, 
etc.). The FSQC authors would like to add support for fsync/fdatasync to 
get some of that performance back, which seems reasonable -- but at that 
point, applications like gzip would still need to call fsync/fdatasync 
to avoid losing data.

You may well be right that eventually file system designers will figure 
this stuff out so that well-written POSIX applications will not lose 
data even if they don't use fsync/fdatasync. However, if FSCQ is any 
indication, we're many years away from that. In the meantime 
fsync/fdatasync is all we have.

> My point is that if gzip has run unsafely for decades without a 
> reported failure, maybe all those who take these things seriously are 
> already using file systems safe enough to guarantee that well behaved 
> tools like gzip do not lose data.

That will be true for many users. Still, I imagine that non-experts 
would have a good deal of trouble connecting the dots between lost data 
and any gzip invocation that lost the data, and could chalk it up to a 
system crash losing data for other reasons.  (After all, things are 
somewhat chaotic during a crash...)  One can find examples on the net 
like "How to recover lost/deleted Gzip compressed gz file" (this is for 
BYclouder, a commercial tool) that talk about system crashes, and which 
indicate (though do not prove) that a real problem exists with gzip.

My sources:

Chen H, Ziegler D, Chajed T, Chlipapa A, Kaashoek MF, Zeldovich N. Using 
Crash Hoare logic for certifying the FSCQ file system. SOSP 2015. 
https://people.csail.mit.edu/nickolai/papers/chen-fscq.pdf

How to recover lost/deleted Gzip compressed gz file. BYclouder. 
2013-04-23. 
http://www.byclouder.com/help/recovery/file/archive/how-to-recover-gzip-gz.html 






Information forwarded to bug-gzip <at> gnu.org:
bug#22768; Package gzip. (Thu, 03 Mar 2016 13:03:01 GMT) Full text and rfc822 format available.

Message #62 received at 22768 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: 22768 <at> debbugs.gnu.org
Cc: Yanyan Jiang <jiangyy <at> outlook.com>
Subject: Re: bug#22768: Crash safety
Date: Thu, 03 Mar 2016 14:08:09 +0100
Paul Eggert wrote:
> You may well be right that eventually file system designers will figure 
> this stuff out so that well-written POSIX applications will not lose 
> data even if they don't use fsync/fdatasync. However, if FSCQ is any 
> indication, we're many years away from that. In the meantime 
> fsync/fdatasync is all we have.

Thanks for the explanation. I have already started the last round of 
release candidates of the lzip family, but just after releasing 
lzip-1.18, in a month or so, i'll implement --synchronous in a way 
compatible with what you implement in gzip.


Best regards,
Antonio.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 01 Apr 2016 11:24:03 GMT) Full text and rfc822 format available.

bug unarchived. Request was from John Wiersba <jrw32982 <at> yahoo.com> to control <at> debbugs.gnu.org. (Mon, 09 May 2016 22:37:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 07 Jun 2016 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 325 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.