GNU bug report logs -
#17546
Problem with du
Previous Next
Reported by: worley <at> alum.mit.edu (Dale R. Worley)
Date: Thu, 22 May 2014 00:18:02 UTC
Severity: normal
Tags: notabug
Merged with 21926
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17546 in the body.
You can then email your comments to 17546 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Thu, 22 May 2014 00:18:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
worley <at> alum.mit.edu (Dale R. Worley)
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 22 May 2014 00:18:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
This message is a sequel to the message copied below.
There is a somewhat subtle problem with how du "never counts the same
file twice". This behavior is what one would expect by default when
du is processing the directory tree which is a single argument, but
it's *not* the behavior that I expect when processing two arguments.
In particular, if one file is part of two argument directory trees, du
will count its space as space within the first argument, but not
within any later argument that contains it.
This leads to startlingly odd behaviors, such as the order of the
arguments to du will change the resulting output, and the output of du
when applied to two arguments is not the concatenation of the outputs
of two executions of du, one with each argument.
It seems to me that the correct behavior (giving what people expect)
is to clear the "hard link cache" between processing each argument, so
that if argument trees overlap, the overlapped part is counted
properly in each argument. Given that before release 8.6, the order
of arguments didn't matter, it seems to me to be unlikely that many
users are specifically depending on caching between arguments.
(Please include my address in any responses.)
Thanks,
Dale
http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00176.html
> From: Paul Eggert
> Subject: bug#7439: du failing at "du -sh . *"
> Date: Fri, 19 Nov 2010 08:52:39 -0800
>
> On 11/18/2010 05:08 PM, Mathias Linnemann-Emden wrote:
> > So not only the output of "du -sh . *" is wrong (not showing du for *),
> > but also the output of "du -sh * ." is incorrect (showing only 4K
> > instead of 8K for ".").
>
> NEWS lists this as a bug fix in release 8.6:
>
> du no longer multiply counts a file that is a directory or whose
> link count is 1, even if the file is reached multiple times by
> following symlinks or via multiple arguments.
>
> The idea is that a single invocation of du never counts the same
> file twice. This was always true for files with multiple hard
> links (you probably didn't notice that), and now it's consistent
> for all files.
>
> To get something like the old behavior, you can use "du -l",
> or invoke "du" separately for each file (depending on how you
> want hard links treated).
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Thu, 22 May 2014 02:43:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 17546 <at> debbugs.gnu.org (full text, mbox):
Dale R. Worley wrote:
> before release 8.6, the order of arguments didn't matter
No, order mattered even back then. For example:
$ du --version | sed 1q
du (GNU coreutils) 8.4
$ ls -li d/* e/*
11765482 -rw-r--r-- 1 eggert csfac 159910666 May 1 20:42 d/j
23558745 -rw-r--r-- 2 eggert csfac 410000000 Apr 22 21:34 d/k
23558745 -rw-r--r-- 2 eggert csfac 410000000 Apr 22 21:34 e/k
$ du d e d e
557664 d
4 e
156480 d
4 e
$ du e d e d
401188 e
156480 d
4 e
156480 d
So file argument order affected link counts even back then; it's just
that before 8.6 this was true only for files with link count greater
than 1, which led to odd behaviors such as the behavior shown above.
What changed in 8.6 is that the behavior was made consistent for all
files, not just those with link count greater than 1. so that for the
same data the current version of du generates output like this:
$ du --version | sed 1q
du (GNU coreutils) 8.22
$ du d e d e
557664 d
4 e
$ du e d e d
401188 e
156480 d
This sums to the same values independent of file order, which is a plus.
As far as I can see, POSIX doesn't allow the old behavior, but does
allow the new one.
> This leads to startlingly odd behaviors
Any choice of behavior for 'du' will lead to odd behaviors sometimes,
and there's no way we can make everybody happy in all cases. There is
an important technical advantage of du's current behavior, though; you
can get the behavior you prefer by running "du X; du Y". If we chaned
du to reset itself between command-line arguments, there'd be no way to
get the behavior I prefer, which is to count files just once.
Added tag(s) notabug.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Thu, 22 May 2014 02:45:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Thu, 22 May 2014 16:22:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 17546 <at> debbugs.gnu.org (full text, mbox):
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> As far as I can see, POSIX doesn't allow the old behavior, but does
> allow the new one.
I looked at what I think is the Posix spec for du
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html#tag_20_36),
and I don't see anything that covers the situation one way or another,
but there may be additional relevant information that I'm not aware
of.
> > This leads to startlingly odd behaviors
>
> Any choice of behavior for 'du' will lead to odd behaviors sometimes,
> and there's no way we can make everybody happy in all cases. There is
> an important technical advantage of du's current behavior, though; you
> can get the behavior you prefer by running "du X; du Y". If we chaned
> du to reset itself between command-line arguments, there'd be no way to
> get the behavior I prefer, which is to count files just once.
OTOH, what I want "du *" to generate has to be done with "for F in * ;
do du $F ; done", which is pretty annoying.
I'd like to suggest adding another option to du to establish the
behavior I want. It would be a weaker-grade of -l. (I shouldn't have
any trouble writing the code for that.) Also, I think a few
additional sentences in the manual page would make du's behavior
clearer.
Dale
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Thu, 22 May 2014 16:58:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 17546 <at> debbugs.gnu.org (full text, mbox):
On 05/22/2014 09:21 AM, Dale R. Worley wrote:
> I don't see anything that covers the situation one way or another
I suppose you're right; the specification is terse, and I guess it can
be read in a different way.
> I'd like to suggest adding another option to du to establish the
> behavior I want.
What behavior do you want? I hope it's not the pre-8.6 behavior, which
had the properties documented in <http://bugs.gnu.org/17546#8>, where
files with link counts > 1 were counted only once while files with link
counts == 1 were counted multiple times.
Is there a version of du somewhere that behaves the way you want?
(Solaris, FreeBSD, etc.?) That might give us insight as to what would
be a good option here.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Thu, 22 May 2014 17:07:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 17546 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 05/22/2014 10:21 AM, Dale R. Worley wrote:
>> From: Paul Eggert <eggert <at> cs.ucla.edu>
>
>> As far as I can see, POSIX doesn't allow the old behavior, but does
>> allow the new one.
>
> I looked at what I think is the Posix spec for du
> (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html#tag_20_36),
> and I don't see anything that covers the situation one way or another,
> but there may be additional relevant information that I'm not aware
> of.
Yes, here's the additional relevant information:
http://austingroupbugs.net/view.php?id=527
Change line 84170 [du DESCRIPTION] from:
Files with multiple links shall be counted and written
for only one entry.
to:
A file that occurs multiple times under one file
operand and that has a link count greater than 1 shall
be counted and written for only one entry. It is
implementation-defined whether a file that has a link
count no greater than 1 is counted and written just
once, or is counted and written for each occurrence.
It is implementation-defined whether a file that
occurs under one file operand is counted for other
file operands.
In FUTURE DIRECTIONS, change line 84274 from "None" to
"A future version of this standard may require that
a file that occurs multiple times shall be counted and
written for only one entry, even if the occurrences
are under different file operands."
Change line 84177 [du OPTIONS] from:
Regardless of the presence of the -a option,
non-directories given as file operands shall always
be listed.
to:
The -a option does not affect whether
non-directories given as file operands are listed.
>
>>> This leads to startlingly odd behaviors
>>
>> Any choice of behavior for 'du' will lead to odd behaviors sometimes,
>> and there's no way we can make everybody happy in all cases. There is
>> an important technical advantage of du's current behavior, though; you
>> can get the behavior you prefer by running "du X; du Y". If we chaned
>> du to reset itself between command-line arguments, there'd be no way to
>> get the behavior I prefer, which is to count files just once.
>
> OTOH, what I want "du *" to generate has to be done with "for F in * ;
> do du $F ; done", which is pretty annoying.
>
> I'd like to suggest adding another option to du to establish the
> behavior I want. It would be a weaker-grade of -l. (I shouldn't have
> any trouble writing the code for that.) Also, I think a few
> additional sentences in the manual page would make du's behavior
> clearer.
>
> Dale
>
>
>
>
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#17546
; Package
coreutils
.
(Fri, 23 May 2014 20:10:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 17546 <at> debbugs.gnu.org (full text, mbox):
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> What behavior do you want?
I'll write out an unambiguous description of what I'm looking for,
along with a set of edits to the manual page that should make it
clear.
Dale
bug closed, send any further explanations to
17546 <at> debbugs.gnu.org and worley <at> alum.mit.edu (Dale R. Worley)
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 11 Oct 2018 22:16:09 GMT)
Full text and
rfc822 format available.
Forcibly Merged 17546 21926.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 24 Oct 2018 21:21:01 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 22 Nov 2018 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 158 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.