GNU bug report logs - #17546
Problem with du

Previous Next

Package: coreutils;

Reported by: worley <at> alum.mit.edu (Dale R. Worley)

Date: Thu, 22 May 2014 00:18:02 UTC

Severity: normal

Tags: notabug

Merged with 21926

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17546 in the body.
You can then email your comments to 17546 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Thu, 22 May 2014 00:18:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to worley <at> alum.mit.edu (Dale R. Worley):
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 22 May 2014 00:18:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: worley <at> alum.mit.edu (Dale R. Worley)
To: bug-coreutils <at> gnu.org
Subject: Problem with du
Date: Wed, 21 May 2014 20:16:33 -0400
This message is a sequel to the message copied below.

There is a somewhat subtle problem with how du "never counts the same
file twice".  This behavior is what one would expect by default when
du is processing the directory tree which is a single argument, but
it's *not* the behavior that I expect when processing two arguments.
In particular, if one file is part of two argument directory trees, du
will count its space as space within the first argument, but not
within any later argument that contains it.

This leads to startlingly odd behaviors, such as the order of the
arguments to du will change the resulting output, and the output of du
when applied to two arguments is not the concatenation of the outputs
of two executions of du, one with each argument.

It seems to me that the correct behavior (giving what people expect)
is to clear the "hard link cache" between processing each argument, so
that if argument trees overlap, the overlapped part is counted
properly in each argument.  Given that before release 8.6, the order
of arguments didn't matter, it seems to me to be unlikely that many
users are specifically depending on caching between arguments.

(Please include my address in any responses.)

Thanks,

Dale


http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00176.html
> From: 	Paul Eggert
> Subject: 	bug#7439: du failing at "du -sh . *"
> Date: 	Fri, 19 Nov 2010 08:52:39 -0800
> 
> On 11/18/2010 05:08 PM, Mathias Linnemann-Emden wrote:
> > So not only the output of "du -sh . *" is wrong (not showing du for *),
> > but also the output of "du -sh * ." is incorrect (showing only 4K
> > instead of 8K for ".").
> 
> NEWS lists this as a bug fix in release 8.6:
> 
>   du no longer multiply counts a file that is a directory or whose
>   link count is 1, even if the file is reached multiple times by
>   following symlinks or via multiple arguments.
> 
> The idea is that a single invocation of du never counts the same
> file twice.  This was always true for files with multiple hard
> links (you probably didn't notice that), and now it's consistent
> for all files.
> 
> To get something like the old behavior, you can use "du -l",
> or invoke "du" separately for each file (depending on how you
> want hard links treated).




Information forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Thu, 22 May 2014 02:43:01 GMT) Full text and rfc822 format available.

Message #8 received at 17546 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Dale R. Worley" <worley <at> alum.mit.edu>
Cc: 17546 <at> debbugs.gnu.org
Subject: Re: bug#17546: Problem with du
Date: Wed, 21 May 2014 19:42:11 -0700
Dale R. Worley wrote:
> before release 8.6, the order of arguments didn't matter

No, order mattered even back then.  For example:

$ du --version | sed 1q
du (GNU coreutils) 8.4
$ ls -li d/* e/*
11765482 -rw-r--r-- 1 eggert csfac 159910666 May  1 20:42 d/j
23558745 -rw-r--r-- 2 eggert csfac 410000000 Apr 22 21:34 d/k
23558745 -rw-r--r-- 2 eggert csfac 410000000 Apr 22 21:34 e/k
$ du d e d e
557664	d
4	e
156480	d
4	e
$ du e d e d
401188	e
156480	d
4	e
156480	d

So file argument order affected link counts even back then; it's just 
that before 8.6 this was true only for files with link count greater 
than 1, which led to odd behaviors such as the behavior shown above. 
What changed in 8.6 is that the behavior was made consistent for all 
files, not just those with link count greater than 1. so that for the 
same data the current version of du generates output like this:

$ du --version | sed 1q
du (GNU coreutils) 8.22
$ du d e d e
557664	d
4	e
$ du e d e d
401188	e
156480	d

This sums to the same values independent of file order, which is a plus.

As far as I can see, POSIX doesn't allow the old behavior, but does 
allow the new one.

> This leads to startlingly odd behaviors

Any choice of behavior for 'du' will lead to odd behaviors sometimes, 
and there's no way we can make everybody happy in all cases.  There is 
an important technical advantage of du's current behavior, though; you 
can get the behavior you prefer by running "du X; du Y".  If we chaned 
du to reset itself between command-line arguments, there'd be no way to 
get the behavior I prefer, which is to count files just once.




Added tag(s) notabug. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 22 May 2014 02:45:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Thu, 22 May 2014 16:22:01 GMT) Full text and rfc822 format available.

Message #13 received at 17546 <at> debbugs.gnu.org (full text, mbox):

From: worley <at> alum.mit.edu (Dale R. Worley)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17546 <at> debbugs.gnu.org
Subject: Re: bug#17546: Problem with du
Date: Thu, 22 May 2014 12:21:26 -0400
> From: Paul Eggert <eggert <at> cs.ucla.edu>

> As far as I can see, POSIX doesn't allow the old behavior, but does 
> allow the new one.

I looked at what I think is the Posix spec for du
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html#tag_20_36),
and I don't see anything that covers the situation one way or another,
but there may be additional relevant information that I'm not aware
of.

> > This leads to startlingly odd behaviors
> 
> Any choice of behavior for 'du' will lead to odd behaviors sometimes, 
> and there's no way we can make everybody happy in all cases.  There is 
> an important technical advantage of du's current behavior, though; you 
> can get the behavior you prefer by running "du X; du Y".  If we chaned 
> du to reset itself between command-line arguments, there'd be no way to 
> get the behavior I prefer, which is to count files just once.

OTOH, what I want "du *" to generate has to be done with "for F in * ;
do du $F ; done", which is pretty annoying.

I'd like to suggest adding another option to du to establish the
behavior I want.  It would be a weaker-grade of -l.  (I shouldn't have
any trouble writing the code for that.)  Also, I think a few
additional sentences in the manual page would make du's behavior
clearer.

Dale




Information forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Thu, 22 May 2014 16:58:01 GMT) Full text and rfc822 format available.

Message #16 received at 17546 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Dale R. Worley" <worley <at> alum.mit.edu>
Cc: 17546 <at> debbugs.gnu.org
Subject: Re: bug#17546: Problem with du
Date: Thu, 22 May 2014 09:56:43 -0700
On 05/22/2014 09:21 AM, Dale R. Worley wrote:
> I don't see anything that covers the situation one way or another

I suppose you're right; the specification is terse, and I guess it can 
be read in a different way.

> I'd like to suggest adding another option to du to establish the 
> behavior I want.

What behavior do you want?  I hope it's not the pre-8.6 behavior, which 
had the properties documented in <http://bugs.gnu.org/17546#8>, where 
files with link counts > 1 were counted only once while files with link 
counts == 1 were counted multiple times.

Is there a version of du somewhere that behaves the way you want? 
(Solaris, FreeBSD, etc.?)  That might give us insight as to what would 
be a good option here.




Information forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Thu, 22 May 2014 17:07:02 GMT) Full text and rfc822 format available.

Message #19 received at 17546 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: "Dale R. Worley" <worley <at> alum.mit.edu>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17546 <at> debbugs.gnu.org
Subject: Re: bug#17546: Problem with du
Date: Thu, 22 May 2014 11:06:10 -0600
[Message part 1 (text/plain, inline)]
On 05/22/2014 10:21 AM, Dale R. Worley wrote:
>> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
>> As far as I can see, POSIX doesn't allow the old behavior, but does 
>> allow the new one.
> 
> I looked at what I think is the Posix spec for du
> (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html#tag_20_36),
> and I don't see anything that covers the situation one way or another,
> but there may be additional relevant information that I'm not aware
> of.

Yes, here's the additional relevant information:

http://austingroupbugs.net/view.php?id=527

Change line 84170 [du DESCRIPTION] from:

      Files with multiple links shall be counted and written
      for only one entry.

    to:

      A file that occurs multiple times under one file
      operand and that has a link count greater than 1 shall
      be counted and written for only one entry. It is
      implementation-defined whether a file that has a link
      count no greater than 1 is counted and written just
      once, or is counted and written for each occurrence.
      It is implementation-defined whether a file that
      occurs under one file operand is counted for other
      file operands.

In FUTURE DIRECTIONS, change line 84274 from "None" to
     "A future version of this standard may require that
      a file that occurs multiple times shall be counted and
      written for only one entry, even if the occurrences
      are under different file operands."



Change line 84177 [du OPTIONS] from:

    Regardless of the presence of the -a option,
    non-directories given as file operands shall always
    be listed.

  to:

    The -a option does not affect whether
    non-directories given as file operands are listed.

> 
>>> This leads to startlingly odd behaviors
>>
>> Any choice of behavior for 'du' will lead to odd behaviors sometimes, 
>> and there's no way we can make everybody happy in all cases.  There is 
>> an important technical advantage of du's current behavior, though; you 
>> can get the behavior you prefer by running "du X; du Y".  If we chaned 
>> du to reset itself between command-line arguments, there'd be no way to 
>> get the behavior I prefer, which is to count files just once.
> 
> OTOH, what I want "du *" to generate has to be done with "for F in * ;
> do du $F ; done", which is pretty annoying.
> 
> I'd like to suggest adding another option to du to establish the
> behavior I want.  It would be a weaker-grade of -l.  (I shouldn't have
> any trouble writing the code for that.)  Also, I think a few
> additional sentences in the manual page would make du's behavior
> clearer.
> 
> Dale
> 
> 
> 
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#17546; Package coreutils. (Fri, 23 May 2014 20:10:01 GMT) Full text and rfc822 format available.

Message #22 received at 17546 <at> debbugs.gnu.org (full text, mbox):

From: worley <at> alum.mit.edu (Dale R. Worley)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17546 <at> debbugs.gnu.org
Subject: Re: bug#17546: Problem with du
Date: Fri, 23 May 2014 16:09:13 -0400
> From: Paul Eggert <eggert <at> cs.ucla.edu>

> What behavior do you want?

I'll write out an unambiguous description of what I'm looking for,
along with a set of edits to the manual page that should make it
clear.

Dale




bug closed, send any further explanations to 17546 <at> debbugs.gnu.org and worley <at> alum.mit.edu (Dale R. Worley) Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 11 Oct 2018 22:16:09 GMT) Full text and rfc822 format available.

Forcibly Merged 17546 21926. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 24 Oct 2018 21:21:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 22 Nov 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 158 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.