GNU bug report logs - #17505
Interface inconsistency, use of intelligent defaults.

Reported by: Linda Walsh <coreutils <at> tlinx.org>

Date: Fri, 16 May 2014 01:26:02 UTC

Severity: normal

Merged with 22277

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17505 in the body.
You can then email your comments to 17505 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Fri, 16 May 2014 01:26:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Linda Walsh <coreutils <at> tlinx.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 16 May 2014 01:26:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: bug-coreutils <at> gnu.org
Subject: Interface inconsistency, use of intelligent defaults.
Date: Thu, 15 May 2014 18:24:11 -0700

On programs that allow input and output by specifying computer-base2 powers
of K/M/G  OR decimal based powers of 10,

If the input units are specified in in powers of 2 then the output should be
given in the same units.

Example:

dd if=/dev/zero of=/dev/null bs=256M count=2
... So 512MB, total -... but what do I see:
536870912 bytes (537 MB) copied, 0.225718 s, 2.4 GB/s

Clearly 256*2 != 537.

At the very least this violates the design principle of 'least surprise'
and/or 'least astonishment'.

The SI suffixes are a pox put on us bye the disk manufacturers because
they wanted to pretend to have 2GB or 4GB drives, when they really
only have 1.8GB, or 1907MB.

Either way, disks are created in powers of 512 (or 4096) byte sectors,
, so while you can exactly specify sizes in powers of 1024, you can't
do the same with powers of 1000  (where the result mush be some multiple of
or 4096 for some new disks).

If I compare this to "df", and see my disk taking 2G, then I should
be able to xfer it to another 2G disk but this is not the case
do to immoral actions on the part of diskmakers.  People knew, at the time,
that 9600 was a 960 character/second -- it was a phone communication speed
where decimal was used, but for storage, units were expressed in multples
of 512 (which the power-of-10 prefixes are not).

(Yes, I know for official purposes, and where the existing established
held sway before the advent of computers, metric-base-10 became understood
as power of 10 based, but in computers, there was never confusion until
disk manufacturers tried to take advantage of people.

Memory does not come in 'kB' mB or gB (kmg=10^(3*{1,2,3}).. it comes
in sizes of KB/MB/GB or (KMG=2^10**{1,2,3}).

But this isn't about changing all unit everywhere... but maintaining
consistency with the units the user used on input (where such can be
verified).

Reasonable?  Or are inconsistent results more reasonable?

;-)

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Fri, 16 May 2014 09:16:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ruediger Meier <sweet_f_a <at> gmx.de>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 11:15:06 +0200

On Friday 16 May 2014, Linda Walsh wrote:
> On programs that allow input and output by specifying computer-base2
> powers of K/M/G  OR decimal based powers of 10,
>
> If the input units are specified in in powers of 2 then the output
> should be given in the same units.
>
> Example:
>
> dd if=/dev/zero of=/dev/null bs=256M count=2
> ... So 512MB, total -... but what do I see:
> 536870912 bytes (537 MB) copied, 0.225718 s, 2.4 GB/s
>
> Clearly 256*2 != 537.
>
> At the very least this violates the design principle of 'least
> surprise' and/or 'least astonishment'.

Yes, I also had to think again and again about this. Often being unsure 
whether it did what I want. Actually I would like to have a global 
switch or env to turn off all powers of 10 based output and input.

Power of 10 is IMO only useful for the human reader to quickly 
understand the magnitude of byte count.

For example these both lines are equally easy to read
 536870912 bytes (537 MB):
 536,870,912 bytes

But power of 10 is annoying as well as thousands separators. The most 
useful and least confusing would be this one:
 536870912 bytes (256 M)

For the last line it's easy mental arithmetic to "calculate" that this 
is 537 MB. But I doubt that anybody wants to know this at all.

cu,
Rudi

Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Fri, 16 May 2014 09:39:02 GMT) Full text and rfc822 format available.

Notification sent to Linda Walsh <coreutils <at> tlinx.org>:
bug acknowledged by developer. (Fri, 16 May 2014 09:39:03 GMT) Full text and rfc822 format available.

Message #13 received at 17505-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Linda Walsh <coreutils <at> tlinx.org>
Cc: 17505-done <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 10:37:55 +0100

[Message part 1 (text/plain, inline)]

On 05/16/2014 02:24 AM, Linda Walsh wrote:
> On programs that allow input and output by specifying computer-base2 powers
> of K/M/G  OR decimal based powers of 10,
> 
> If the input units are specified in in powers of 2 then the output should be
> given in the same units.
> 
> Example:
> 
> dd if=/dev/zero of=/dev/null bs=256M count=2
> ... So 512MB, total -... but what do I see:
> 536870912 bytes (537 MB) copied, 0.225718 s, 2.4 GB/s
> 
> Clearly 256*2 != 537.
> 
> At the very least this violates the design principle of 'least surprise'
> and/or 'least astonishment'.

I agree that the units representation is unfortunate,
but an accident of history.
POSIX species 'k' and 'b' to mean 1024 and 512 respectively.
Standards wise 'k' should really mean 1000 and 'K' 1024.
Then extending from that we now have (which we can't change for compat reasons):

  k=K=kiB=KiB=1024
  kb=KB=1000
  M=MiB=1024^2
  MB=1000^2
  ...

However when _outputting) the stats line we could use the
least ambiguous and most standard unit, which would be the IEC unit.
The attached patch changes the output to:

  $ dd if=/dev/zero of=/dev/null bs=256M count=2
  2+0 records in
  2+0 records out
  536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s

thanks,
Pádraig.

[dd-stats-units.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Fri, 16 May 2014 10:02:02 GMT) Full text and rfc822 format available.

Message #16 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Ruediger Meier <sweet_f_a <at> gmx.de>
To: 17505 <at> debbugs.gnu.org,
 P <at> draigbrady.com,
 coreutils <at> tlinx.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 12:01:06 +0200

On Friday 16 May 2014, Pádraig Brady wrote:
> On 05/16/2014 02:24 AM, Linda Walsh wrote:
> > On programs that allow input and output by specifying
> > computer-base2 powers of K/M/G  OR decimal based powers of 10,
> >
> > If the input units are specified in in powers of 2 then the output
> > should be given in the same units.
> >
> > Example:
> >
> > dd if=/dev/zero of=/dev/null bs=256M count=2
> > ... So 512MB, total -... but what do I see:
> > 536870912 bytes (537 MB) copied, 0.225718 s, 2.4 GB/s
> >
> > Clearly 256*2 != 537.
> >
> > At the very least this violates the design principle of 'least
> > surprise' and/or 'least astonishment'.
>
> I agree that the units representation is unfortunate,
> but an accident of history.
> POSIX species 'k' and 'b' to mean 1024 and 512 respectively.
> Standards wise 'k' should really mean 1000 and 'K' 1024.
> Then extending from that we now have (which we can't change for
> compat reasons):
>
>   k=K=kiB=KiB=1024
>   kb=KB=1000
>   M=MiB=1024^2
>   MB=1000^2
>   ...
>
> However when _outputting) the stats line we could use the
> least ambiguous and most standard unit, which would be the IEC unit.
> The attached patch changes the output to:
>
>   $ dd if=/dev/zero of=/dev/null bs=256M count=2
>   2+0 records in
>   2+0 records out
>   536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s

Thanks!
What about just "512 M" which looks IMO better, is a valid input unit 
and is explained in the man page.


cu,
Rudi

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Fri, 16 May 2014 10:20:02 GMT) Full text and rfc822 format available.

Message #19 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Ruediger Meier <sweet_f_a <at> gmx.de>
Cc: 17505 <at> debbugs.gnu.org, coreutils <at> tlinx.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 11:19:33 +0100

On 05/16/2014 11:01 AM, Ruediger Meier wrote:
> On Friday 16 May 2014, Pádraig Brady wrote:
>> The attached patch changes the output to:
>>
>>   $ dd if=/dev/zero of=/dev/null bs=256M count=2
>>   2+0 records in
>>   2+0 records out
>>   536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s
> 
> Thanks!
> What about just "512 M" which looks IMO better, is a valid input unit 
> and is explained in the man page.

That would be less clear I think since in
standards notation, 512M is 512000000.
Also adding the B removes any ambiguity
as to whether this referred to bytes of blocks.

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Fri, 16 May 2014 21:22:02 GMT) Full text and rfc822 format available.

Message #22 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: Ruediger Meier <sweet_f_a <at> gmx.de>, 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 14:20:42 -0700

Pádraig Brady wrote:
> On 05/16/2014 11:01 AM, Ruediger Meier wrote:
>> On Friday 16 May 2014, Pádraig Brady wrote:
>>> The attached patch changes the output to:
>>>
>>>   $ dd if=/dev/zero of=/dev/null bs=256M count=2
>>>   2+0 records in
>>>   2+0 records out
>>>   536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s
>> Thanks!
>> What about just "512 M" which looks IMO better, is a valid input unit 
>> and is explained in the man page.
> 
> That would be less clear I think since in
> standards notation, 512M is 512000000.
> Also adding the B removes any ambiguity
> as to whether this referred to bytes of blocks.
----

Since 'B' already refers to 2^3 (most commonly) bits of information
saying "KiB" = 1024 information Bytes.  What other type of bytes are
there?  I would acknowledge some ambiguity when using the prefixes with
'bits', but with 'Bytes' you only have their usage/reference in relation
to 'information'.  Note that in the information field, when referring
to timings, milli, micro, nano -- all refer to an abstract, non-information
quantity (time in 's'). When referring to non-computer units SI prefixes would
be the default.  But for space, in 'bytes' -- they are an 'information unit' that
has no physical basis for measurement.  I think the SI standard was too
hastily pushed upon the nascent computer industry by established and more
dominant companies that were used to talking about physical that relate
to concrete physical quantities.

I'm beginning to wonder how one would go about correcting
the SI standard so as not to introduce inaccuracies in measurement
in the computer industry.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 00:14:01 GMT) Full text and rfc822 format available.

Message #25 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: P <at> draigBrady.com
Cc: 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 17:13:39 -0700

Pádraig Brady wrote:

> The attached patch changes the output to:
>
>    $ dd if=/dev/zero of=/dev/null bs=256M count=2
>    2+0 records in
>    2+0 records out
>    536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s

I recall considering this when I added this kind of diagnostic to GNU dd 
back in 2004, and going with powers-of-1000 abbreviations because 
secondary storage devices are normally measured that way.  For this 
reason, I expect many users will prefer powers-of-1000 here.  This is 
particularly true for transfer rates: it's rare to see "GiB/s" in 
real-world prose.

So it'd be unwise to make this change.

The simplest thing to do is to leave "dd" alone, which is my mild 
preference.  Alternatively, we could make the proposed behavior 
optional, with the default being the current behavior.  If we do that, 
though, the behavior shouldn't be affected by the abbreviation chosen 
for the block size.  Even if the block size is given in powers-of-1024 
(which is common, because block sizes are about internal memory units, 
where powers-of-1024 are typical), the total number of bytes transferred 
and the transfer rates are more commonly interpreted in the external 
world, where powers-of-1000 are typical.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 00:59:01 GMT) Full text and rfc822 format available.

Message #28 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17505 <at> debbugs.gnu.org, P <at> draigBrady.com
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 17:58:16 -0700

Paul Eggert wrote:
> Pádraig Brady wrote:
>
>> The attached patch changes the output to:
>>
>>    $ dd if=/dev/zero of=/dev/null bs=256M count=2
>>    2+0 records in
>>    2+0 records out
>>    536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s
>
> I recall considering this when I added this kind of diagnostic to GNU 
> dd back in 2004, and going with powers-of-1000 abbreviations because 
> secondary storage devices are normally measured that way.  For this 
> reason, I expect many users will prefer powers-of-1000 here.  This is 
> particularly true for transfer rates: it's rare to see "GiB/s" in 
> real-world prose.
>
> So it'd be unwise to make this change.
----
   When users see 512 MB copied, they expect it means 512*1024*1024.

The same goes for the GB/s figure.

If you went with Gb/s -- that's different, as we are more used to seeing 
bits/s, which
is why I could go either way with that.

>
>
> The simplest thing to do is to leave "dd" alone, which is my mild 
> preference.  Alternatively, we could make the proposed behavior 
> optional, with the default being the current behavior.  If we do that, 
> though, the behavior shouldn't be affected by the abbreviation chosen 
> for the block size.  Even if the block size is given in powers-of-1024 
> (which is common, because block sizes are about internal memory units, 
> where powers-of-1024 are typical), the total number of bytes 
> transferred and the transfer rates are more commonly interpreted in 
> the external world, where powers-of-1000 are typical.
----
What external world are you talking about?  Where you talk about MB or 
GB /s outside of the
computer world?  If what you said was true, then people wouldn't have 
responded that 125MB/s
was impossible (in the external world) on a 1Gb ethernet.  Yet that's 
what 'dd' displays.
See 
"http://superuser.com/questions/753597/fastest-way-to-copy-1tb-safely-over-the-wire/753617".

See the comments under the the 2nd answer.  "125MB/s is literally 
impossible with a 1Gbit/s line - there will be overhead..."-(Bob)  and
"Without very significant compression (which is only achievable on 
extremely low entropy data), you're never going to see 125 MB/s in any 
direction on GbE." (allquixotic).

They don't believe 125MB/s is possible even though that's what 'dd' 
stated.  It never occurs to
people, talking about computers and speeds that someone has slipped in 
decimal -- it never happened before disk manufacturers wanted to inflate 
their figures.  By not putting a stop
to the nonsense that MB != 1024*1024 when disk manufacturers muddied the 
waters, it's led
to all sorts of miscommunications.  

The industry leader in computing doesn't use KB to mean 1000B, nor 
M=10^6 ... Microsoft's
disk space and rates both use 1024 based measurements. 

So what external world (who's opinion matters in the computer world) are 
you talking about?

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 01:24:01 GMT) Full text and rfc822 format available.

Message #31 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Linda Walsh <coreutils <at> tlinx.org>
Cc: 17505 <at> debbugs.gnu.org, P <at> draigBrady.com
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Fri, 16 May 2014 18:22:57 -0700

Linda Walsh wrote:
>"125MB/s is literally impossible with a 1Gbit/s line - there will be 
overhead"

This comment is using the usual powers-of-1000 abbreviations for both 
the first figure (125 MB/s) and the second one (1 Gb/s), so it supports 
the assertion that powers-of-1000 are more common in ordinary usage. 
125 MB/s is impossible is because there is some overhead at lower 
protocol levels, which means that you cannot possibly transfer 1 Gb of 
data over a 1 Gb/s line in one second, i.e., you cannot possibly 
transfer 125 MB of data over that line in one second, and that's what 
the comment says.

Google is a wonderful tool, and I'm sure that if you search hard enough 
you will eventually find uses of powers-of-1024 abbreviations for 
secondary storage capacity and transfer rates.  But they're rare 
compared to powers-of-1000 abbreviations, such as the abbreviations in 
the example you gave.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 08:24:02 GMT) Full text and rfc822 format available.

Message #34 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17505 <at> debbugs.gnu.org, P <at> draigBrady.com
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Sat, 17 May 2014 01:23:20 -0700


Paul Eggert wrote:
> Linda Walsh wrote:
>> "125MB/s is literally impossible with a 1Gbit/s line - there will be 
> overhead"
> 
> This comment is using the usual powers-of-1000 abbreviations for both 
> the first figure (125 MB/s) and the second one (1 Gb/s), so it supports 
> the assertion that powers-of-1000 are more common in ordinary usage. 125 
> MB/s is impossible is because there is some overhead at lower protocol 
> levels, which means that you cannot possibly transfer 1 Gb of data over 
> a 1 Gb/s line in one second, i.e., you cannot possibly transfer 125 MB 
> of data over that line in one second, and that's what the comment says.
----
	I see what you are saying, but having done that measurement myself,
I can assure you the 125MB/s is exactly what 'dd' reports (using direct
I/O).  As I stated previously, when talking about bits, I see the decimal usage
as often as not.  But when people talk about timings, they want to know how
long it will take to transfer the data on their disk -- given in base2 units
to 'X'...

	Compare to 'ls', 'du', -- all give base2 units.  If you think about
it the only way it would be "impossible"  is if they though it was
125 * 2^20.  But getting 125*10^6, is relatively trivial if your overhead is
< 1% -- dd won't show it.  I could ask for clarification whether they were
using 2^20 or 10^6 for M.  But 'dd' only requires that the overhead be less
than .4 or .5% to display 125.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 10:34:02 GMT) Full text and rfc822 format available.

Message #37 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Sat, 17 May 2014 11:32:56 +0100

[Message part 1 (text/plain, inline)]

On 05/17/2014 01:13 AM, Paul Eggert wrote:
> Pádraig Brady wrote:
> 
>> The attached patch changes the output to:
>>
>>    $ dd if=/dev/zero of=/dev/null bs=256M count=2
>>    2+0 records in
>>    2+0 records out
>>    536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s
> 
> I recall considering this when I added this kind of diagnostic to GNU dd back in 2004
> and going with powers-of-1000 abbreviations because secondary storage devices are normally
> measured that way.  For this reason, I expect many users will prefer powers-of-1000 here.

This is a fair point as it's common to transfer MB based images in
MiB sized blocks for example.

Though the 512 MiB above is useful as one can immediately see that the
requested amount was transferred.  Also it imparts more info than
537 MB as that is trivially inferred from the previous number.
Also MiB is not ambiguous wrt base, though I suppose MB isn't
too bad either as MB as per standards and dd input notation is base 1000.

> This is particularly true for transfer rates: it's rare to see "GiB/s" in real-world prose.

Fair point. We'll leave that one as is.

> 
> So it'd be unwise to make this change.
> 
> The simplest thing to do is to leave "dd" alone, which is my mild preference.
> Alternatively, we could make the proposed behavior optional, with the default being the current behavior.

> If we do that, though, the behavior shouldn't be affected by the abbreviation chosen for the block size.
> Even if the block size is given in powers-of-1024 (which is common, because block sizes are about internal
> memory units, where powers-of-1024 are typical), the total number of bytes transferred and the transfer rates
> are more commonly interpreted in the external world, where powers-of-1000 are typical.

It's not worth a new option, but if it was to be conditional perhaps it could be
based on the actual amount transferred rather than the block size.
Or from a mathematical viewpoint, output the number that loses the least info.
Essentially:

  if ((count % 1000) && ! (count % 1024))
    options |= human_base_1024

The attached patch now produces:

  $ dd if=/dev/zero of=/dev/null bs=256M count=2
  2+0 records in
  2+0 records out
  536870912 bytes (512 MiB) copied, 0.200283 s, 2.7 GB/s

  $ truncate -s 256MB disk.img
  $ dd if=disk.img of=/dev/null bs=2M
  122+1 records in
  122+1 records out
  256000000 bytes (256 MB) copied, 0.129617 s, 2.0 GB/s

cheers,
Pádraig.

[dd-stats-units.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 16:45:02 GMT) Full text and rfc822 format available.

Message #40 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Sat, 17 May 2014 09:44:01 -0700

Pádraig Brady wrote:
>    if ((count % 1000) && ! (count % 1024))
>      options |= human_base_1024

Unfortunately this won't work either, as it would introduce a worse 
user-interface glitch: transfers of some block counts would be treated 
inconsistently with transfers of others.  If I've done the math right:

$ dd if=/dev/zero of=/dev/null count=99997
99997+0 records in
99997+0 records out
51198464 bytes (51 MB) copied, 0.101863 s, 503 MB/s
$ dd if=/dev/zero of=/dev/null count=99998
99998+0 records in
99998+0 records out
51198976 bytes (49 MiB) copied, 0.0938181 s, 546 MB/s

A quick glance at the output might incorrectly conclude that the first 
dd (51 M) transferred more data than the second dd (49 M).

If we're going to make a change, perhaps it would be better to make it a 
separate option that ORs in human_base_1024, so that a user who prefers 
powers-of-1024 can alias 'dd' to 'dd status=human-readable' or whatever.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 17 May 2014 18:05:06 GMT) Full text and rfc822 format available.

Message #43 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Leslie Satenstein <lsatenstein <at> yahoo.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Sat, 17 May 2014 11:29:54 -0004

[Message part 1 (text/plain, inline)]

What other kinds of bytes are there.  There are 9 bit bytes, with 
parity as a bit, and some old communication stuff with 7 bit bytes. 
There was also the IBM punchcard bytes. 

For the past 60+ years, since paper tape, bytes have been 8 bits, so 
we should assume that is the standard. UTF-8 is a discussion for 
another time. 

There are metric bytes (1000 bytes) which I would refer to as 000's of 
octets, even though disk manufacturers use 000's in their offer of 
terrabytes and not multiples of 1024.

If the abbreviation (K,M,G,P) precedes  'b', as in Kb, kB, KB, and kb, 
these abbreviations should be interpreted as appropriate multiples of 
1024 bytes (2^10).

If you absolutely want to distinguish between 2^10 vs 10^3  use
Ko, kO, KO, or ko to represent the latter (000's).  

You could apply the above rule to megabytes,  512Mb vs 512Mo
and of course to Gb vs Go, Tb vs To and Pb vs Po (Peta bytes vs Peta 
octets)

All you require is one statement, all multiples of kb are multiples of 
1024 bytes. 


On Sat, May 17, 2014 at 6:32 AM, Pádraig Brady <P <at> draigBrady.com> 
wrote:
> On 05/17/2014 01:13 AM, Paul Eggert wrote:
>>  Pádraig Brady wrote:
>>  
>>>  The attached patch changes the output to:
>>> 
>>>     $ dd if=/dev/zero of=/dev/null bs=256M count=2
>>>     2+0 records in
>>>     2+0 records out
>>>     536870912 bytes (512 MiB) copied, 0.152887 s, 3.3 GiB/s
>>  
>>  I recall considering this when I added this kind of diagnostic to 
>> GNU dd back in 2004
>>  and going with powers-of-1000 abbreviations because secondary 
>> storage devices are normally
>>  measured that way.  For this reason, I expect many users will 
>> prefer powers-of-1000 here.
> 
> This is a fair point as it's common to transfer MB based images in
> MiB sized blocks for example.
> 
> Though the 512 MiB above is useful as one can immediately see that the
> requested amount was transferred.  Also it imparts more info than
> 537 MB as that is trivially inferred from the previous number.
> Also MiB is not ambiguous wrt base, though I suppose MB isn't
> too bad either as MB as per standards and dd input notation is base 
> 1000.
> 
>>  This is particularly true for transfer rates: it's rare to see 
>> "GiB/s" in real-world prose.
> 
> Fair point. We'll leave that one as is.
> 
>>  
>>  So it'd be unwise to make this change.
>>  
>>  The simplest thing to do is to leave "dd" alone, which is my mild 
>> preference.
>>  Alternatively, we could make the proposed behavior optional, with 
>> the default being the current behavior.
> 
>>  If we do that, though, the behavior shouldn't be affected by the 
>> abbreviation chosen for the block size.
>>  Even if the block size is given in powers-of-1024 (which is common, 
>> because block sizes are about internal
>>  memory units, where powers-of-1024 are typical), the total number 
>> of bytes transferred and the transfer rates
>>  are more commonly interpreted in the external world, where 
>> powers-of-1000 are typical.
> 
> It's not worth a new option, but if it was to be conditional perhaps 
> it could be
> based on the actual amount transferred rather than the block size.
> Or from a mathematical viewpoint, output the number that loses the 
> least info.
> Essentially:
> 
>   if ((count % 1000) && ! (count % 1024))
>     options |= human_base_1024
> 
> The attached patch now produces:
> 
>   $ dd if=/dev/zero of=/dev/null bs=256M count=2
>   2+0 records in
>   2+0 records out
>   536870912 bytes (512 MiB) copied, 0.200283 s, 2.7 GB/s
> 
>   $ truncate -s 256MB disk.img
>   $ dd if=disk.img of=/dev/null bs=2M
>   122+1 records in
>   122+1 records out
>   256000000 bytes (256 MB) copied, 0.129617 s, 2.0 GB/s
> 
> cheers,
> Pádraig.

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sun, 18 May 2014 01:34:01 GMT) Full text and rfc822 format available.

Message #46 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Interface inconsistency, use of intelligent defaults.
Date: Sun, 18 May 2014 02:33:30 +0100

On 05/17/2014 05:44 PM, Paul Eggert wrote:
> Pádraig Brady wrote:
>>    if ((count % 1000) && ! (count % 1024))
>>      options |= human_base_1024
> 
> Unfortunately this won't work either, as it would introduce a worse user-interface glitch: transfers of some block counts would be treated inconsistently with transfers of others.  If I've done the math right:
> 
> $ dd if=/dev/zero of=/dev/null count=99997
> 99997+0 records in
> 99997+0 records out
> 51198464 bytes (51 MB) copied, 0.101863 s, 503 MB/s
> $ dd if=/dev/zero of=/dev/null count=99998
> 99998+0 records in
> 99998+0 records out
> 51198976 bytes (49 MiB) copied, 0.0938181 s, 546 MB/s

Not sure how much of an issue that would be in practice.

> 
> A quick glance at the output might incorrectly conclude that the first dd (51 M) transferred more data than the second dd (49 M).
> 
> If we're going to make a change, perhaps it would be better to make it a separate option that ORs in human_base_1024, so that a user who prefers powers-of-1024 can alias 'dd' to 'dd status=human-readable' or whatever.

Not worth an option I think,
so let's leave this for now.

thanks,
Pádraig.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 15 Jun 2014 11:24:03 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 16 Jul 2014 09:45:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Wed, 16 Jul 2014 12:31:02 GMT) Full text and rfc822 format available.

Message #53 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Henrik Juul Pedersen <henrikjuul <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 17505 <at> debbugs.gnu.org, Christian Groessler <chris <at> groessler.org>,
 Coreutils <coreutils <at> gnu.org>
Subject: Re: dd statistics output
Date: Wed, 16 Jul 2014 14:24:15 +0200

[Message part 1 (text/plain, inline)]

Christian Groessler writes:
>
> 268435456 bytes (256 MB) copied, ...
>

This would be a clear violation of the SI standard, which says on its
prefixes:

"These SI prefixes refer strictly to powers of 10. They should not be used
to indicate powers of 2 (for example, one kilobit represents 1000 bits and
not 1024 bits). The IEC has adopted prefixes for binary powers in the
international standard IEC 60027-2: 2005, third edition, Letter symbols to
be used in electrical technology – Part 2: Telecommunications and
electronics. The names and symbols for the prefixes corresponding to 210,
220, 230, 240, 250, and 260 are, respectively: kibi, Ki; mebi, Mi; gibi,
Gi; tebi, Ti; pebi, Pi; and exbi, Ei. Thus, for example, one kibibyte would
be written: 1 KiB = 210 B = 1024 B, where B denotes a byte. Although these
prefixes are not part of the SI, they should be used in the field of
information technology to avoid the incorrect usage of the SI prefixes."
[1, page 121]

I would second Pádraig Bradys:
>
>  268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
>

Or as neither bit nor byte are SI units, one might even keep all IEC units
in IEC binary prefix as such:

>
>  268435456 bytes (256 MiB) copied, 0.0248346 s, 10.1 GiB/s
>

Best regards
Henrik Juul Pedersen

[1] http://www.bipm.org/utils/common/pdf/si_brochure_8_en.pdf

On Wed, Jul 16, 2014 at 11:38 AM, Pádraig Brady <P <at> draigbrady.com> wrote:

> On 07/16/2014 03:45 AM, Christian Groessler wrote:
> > Hi,
> >
> > the final output of 'dd' is in "SI mode" (or how to call it). It uses
> 10^6 instead of 2^20 for "megabyte".
> >
> > Example:
> >
> > $ dd if=/dev/zero of=/dev/null bs=65536 count=4096
> > 4096+0 records in
> > 4096+0 records out
> > 268435456 bytes (268 MB) copied, 0.0248346 s, 10.8 GB/s
> > $
> >
> > Is there a switch to display in "traditional" units, I'd like to have
> >
> > 268435456 bytes (256 MB) copied, ...
>
> http://bugs.gnu.org/17505#37 was proposed do the following automatically
> (depending on the amount output):
>
>   268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
>
> However that wasn't applied due to inconsistency concerns.
> I'm still of the opinion that the change above would be a net gain,
> as the number in brackets is for human interpretation, and in the vast
> majority of cases would be the best representation for that.
>
> Pádraig.
>
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Wed, 16 Jul 2014 13:43:04 GMT) Full text and rfc822 format available.

Message #56 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Christian Groessler <chris <at> groessler.org>
Cc: 17505 <at> debbugs.gnu.org, coreutils <at> gnu.org
Subject: Re: dd statistics output
Date: Wed, 16 Jul 2014 14:42:21 +0100

On 07/16/2014 10:38 AM, Pádraig Brady wrote:
> On 07/16/2014 03:45 AM, Christian Groessler wrote:
>> Hi,
>>
>> the final output of 'dd' is in "SI mode" (or how to call it). It uses 10^6 instead of 2^20 for "megabyte".
>>
>> Example:
>>
>> $ dd if=/dev/zero of=/dev/null bs=65536 count=4096
>> 4096+0 records in
>> 4096+0 records out
>> 268435456 bytes (268 MB) copied, 0.0248346 s, 10.8 GB/s
>> $
>>
>> Is there a switch to display in "traditional" units, I'd like to have
>>
>> 268435456 bytes (256 MB) copied, ...
> 
> http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output):
> 
>   268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
> 
> However that wasn't applied due to inconsistency concerns.
> I'm still of the opinion that the change above would be a net gain,
> as the number in brackets is for human interpretation, and in the vast
> majority of cases would be the best representation for that.

Note another reason to _not_ apply the patch is that
requests to print the statistics can come async through SIGUSR1,
and thus increase the chances of inconsistent output.

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Wed, 16 Jul 2014 22:18:02 GMT) Full text and rfc822 format available.

Message #59 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Christian Groessler <chris <at> groessler.org>
Cc: 17505 <at> debbugs.gnu.org, coreutils <at> gnu.org
Subject: Re: dd statistics output
Date: Wed, 16 Jul 2014 23:17:19 +0100

On 07/16/2014 11:09 PM, Christian Groessler wrote:
> On 07/16/14 15:42, Pádraig Brady wrote:
>> Note another reason to _not_ apply the patch is that
>> requests to print the statistics can come async through SIGUSR1,
>> and thus increase the chances of inconsistent output.
> 
> 
> Sorry, I cannot follow. Which inconsistent output are you referring to?
> 
> regards,
> chris

It's a bit of an edge case, but if working with 1024 base quantities,
rarely one might get 1000 based statistics as the selector is essentially:

     if ((n_written % 1000) && ! (n_written % 1024))
       human_opts |= human_base_1024;

So if SIGUSR1 was sent after 1000 blocks were written for example,
then SI stats would be printed rather than IEC.
Yes it's quite the edge case, and not especially problematic I think,
but worth mentioning.

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Thu, 17 Jul 2014 00:28:02 GMT) Full text and rfc822 format available.

Message #62 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Christian Groessler <chris <at> groessler.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org, coreutils <at> gnu.org
Subject: Re: dd statistics output
Date: Thu, 17 Jul 2014 00:09:02 +0200

On 07/16/14 15:42, Pádraig Brady wrote:
> Note another reason to _not_ apply the patch is that
> requests to print the statistics can come async through SIGUSR1,
> and thus increase the chances of inconsistent output.


Sorry, I cannot follow. Which inconsistent output are you referring to?

regards,
chris

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Mon, 21 Jul 2014 21:10:02 GMT) Full text and rfc822 format available.

Message #65 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org, Christian Groessler <chris <at> groessler.org>,
 coreutils <at> gnu.org
Subject: Re: bug#17505: dd statistics output
Date: Mon, 21 Jul 2014 14:09:23 -0700

Found old bug, still open...

Pádraig Brady wrote:
> On 07/16/2014 10:38 AM, Pádraig Brady wrote:
>   
>> http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output):
>>
>>   268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
>>
>> However that wasn't applied due to inconsistency concerns.
>> I'm still of the opinion that the change above would be a net gain,
>> as the number in brackets is for human interpretation, and in the vast
>> majority of cases would be the best representation for that.
----
   One patch that would not be inconsistent:

   If the user uses units of a single system (i.e. doesn't use 'si' and 
b2 units
in same statement), then display the summary units using the same 
notation the
user used:

dd if=xx bs=256M
...(256M copied)....
vs.
dd if=xx bs=256MB
...(256MB copied)...

> Note another reason to _not_ apply the patch is that
> requests to print the statistics can come async through SIGUSR1,
> and thus increase the chances of inconsistent output.
Solves this too, since the units are decided when the command is parsed,
so SIGUSR would use the same units as would come out on a final summary.


Or is using consistent units w/what the user users not ok?

Note, for statements w/o units (or mixed system), there would be no 
reason to change
current behavior.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 26 Jul 2014 01:36:02 GMT) Full text and rfc822 format available.

Message #68 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Pádraig Brady <P <at> draigBrady.com>, 17505 <at> debbugs.gnu.org,
        Christian Groessler <chris <at> groessler.org>, coreutils <at> gnu.org
Subject: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
Date: Fri, 25 Jul 2014 18:35:09 -0700

Pádraig: you may have missed this as it was a reply to
an old thread, but, changing the subj and composing as new
should prevent that (I hope)....

You were concerned that the user would get different outputs
based on the previously suggested algorithm -- as well as
possibly different output when SIGUSR1 came in.

This idea seems to solve both of those -- so if the patch that was
proposed for this was modified in line with this suggestion,
would there be any further problems?


Linda Walsh wrote:
> Found old bug, still open...
>
> Pádraig Brady wrote:
>> On 07/16/2014 10:38 AM, Pádraig Brady wrote:
>>  
>>> http://bugs.gnu.org/17505#37 was proposed do the following 
>>> automatically (depending on the amount output):
>>>
>>>   268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
>>>
>>> However that wasn't applied due to inconsistency concerns.
>>> I'm still of the opinion that the change above would be a net gain,
>>> as the number in brackets is for human interpretation, and in the vast
>>> majority of cases would be the best representation for that.
> ----
>    One patch that would not be inconsistent:
>
>    If the user uses units of a single system (i.e. doesn't use 'si' 
> and b2 units
> in same statement), then display the summary units using the same 
> notation the
> user used:
>
> dd if=xx bs=256M
> ...(256M copied)....
> vs.
> dd if=xx bs=256MB
> ...(256MB copied)...
>
>> Note another reason to _not_ apply the patch is that
>> requests to print the statistics can come async through SIGUSR1,
>> and thus increase the chances of inconsistent output.
> Solves this too, since the units are decided when the command is parsed,
> so SIGUSR would use the same units as would come out on a final summary.
>
>
> Or is using consistent units w/what the user users not ok?
>
> Note, for statements w/o units (or mixed system), there would be no 
> reason to change
> current behavior.
>
>
>
>
>

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 26 Jul 2014 20:59:02 GMT) Full text and rfc822 format available.

Message #71 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Linda Walsh <coreutils <at> tlinx.org>
Cc: 17505 <at> debbugs.gnu.org, Christian Groessler <chris <at> groessler.org>
Subject: Re: bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
Date: Sat, 26 Jul 2014 21:58:34 +0100

On 07/26/2014 02:35 AM, Linda Walsh wrote:
> Pádraig: you may have missed this as it was a reply to
> an old thread, but, changing the subj and composing as new
> should prevent that (I hope)....
> 
> You were concerned that the user would get different outputs
> based on the previously suggested algorithm -- as well as
> possibly different output when SIGUSR1 came in.
> 
> This idea seems to solve both of those -- so if the patch that was
> proposed for this was modified in line with this suggestion,
> would there be any further problems?
> 
> 
> Linda Walsh wrote:
>> Found old bug, still open...
>>
>> Pádraig Brady wrote:
>>> On 07/16/2014 10:38 AM, Pádraig Brady wrote:
>>>  
>>>> http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output):
>>>>
>>>>   268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s
>>>>
>>>> However that wasn't applied due to inconsistency concerns.
>>>> I'm still of the opinion that the change above would be a net gain,
>>>> as the number in brackets is for human interpretation, and in the vast
>>>> majority of cases would be the best representation for that.
>> ----
>>    One patch that would not be inconsistent:
>>
>>    If the user uses units of a single system (i.e. doesn't use 'si' and b2 units
>> in same statement), then display the summary units using the same notation the
>> user used:
>>
>> dd if=xx bs=256M
>> ...(256M copied)....
>> vs.
>> dd if=xx bs=256MB
>> ...(256MB copied)...
>>
>>> Note another reason to _not_ apply the patch is that
>>> requests to print the statistics can come async through SIGUSR1,
>>> and thus increase the chances of inconsistent output.
>> Solves this too, since the units are decided when the command is parsed,
>> so SIGUSR would use the same units as would come out on a final summary.
>>
>>
>> Or is using consistent units w/what the user users not ok?
>>
>> Note, for statements w/o units (or mixed system), there would be no reason to change
>> current behavior.

That was the original approach but is a bit worse than the dynamic approach
since it's common to specify transfer sizes in IEC units for SI sized data.

BTW I was playing devil's advocate with my mention of the SIGUSR1 inconsistency.
I'm still of the opinion that the dynamic switch of human units based on
current transferred amount is the lesser of two evils, since this output
is destined for human consumption.

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sun, 27 Jul 2014 17:13:02 GMT) Full text and rfc822 format available.

Message #74 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org, Christian Groessler <chris <at> groessler.org>
Subject: Re: bug#17505: Pádraig: does this solve your consistency	concern? (was bug#17505: dd statistics output)
Date: Sun, 27 Jul 2014 10:11:58 -0700

Pádraig Brady wrote:
> 
> That was the original approach but is a bit worse than the dynamic approach
> since it's common to specify transfer sizes in IEC units for SI sized data.
----
It is more common to specify transfer sizes in SI and mean IEC if you
are in the US where the digital computer was created.

People in the US have not adopted SI units and many wouldn't know
a meter from a molehill, so SI units aren't the first thing that
they are likely to be meaning.  Computer scientists and the industry here,
grew up with using IEC prefixes where multiples of 8 are already in
use.  I.e. if you are talking *bytes*, you are using base 2.

It is inconsistent to switch to decimal prefixes when talking about
binary numbers.

OTOH, if you are talking *bits*, I would say usage meaning SI units
are more common.

Bytes = 2^3 bits.  not 10 bits.

Now I was willing to go so far as to not force incompatible or bad
nomenclature upon others, but to use their own nomenclature when
replying to them.

If someone came up to you and spoke a question in French, would you
answer them in English and make some comment about people using
French by accident and they really mean to use English?

If you goal was clear communication, you'd try to answer in the language
they were querying in (presuming you knew it).  Only giving responses
in English, when you accept input in French, would likely be thought
insulting.

If people are that concerned to get the output they want in "SI", they
might be bothered to use it on input (or read the manpage and
find out how to make it happen).  For those that are concerned to get the
output they want in computer compatible binary, you seem to be
saying they are S-O-L, which seems a poor and selfish attitude to
be taking.

> BTW I was playing devil's advocate with my mention of the SIGUSR1 inconsistency.
> I'm still of the opinion that the dynamic switch of human units based on
> current transferred amount is the lesser of two evils, since this output
> is destined for human consumption.
====
	If it is for human consumption, humans like consistency --
if they speak to you in 1 language, they likely appreciate being
replied to in the same .. same goes for terminology and units.

	If someone asks you how many kilometers it is to XXX and
you come back with 38 miles, you think that's a user friendly design?

> 
> cheers,
> Pádraig.
> 
> 
>

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Mon, 28 Jul 2014 17:55:01 GMT) Full text and rfc822 format available.

Message #77 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Christian Groessler <chris <at> groessler.org>
To: Linda Walsh <coreutils <at> tlinx.org>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 17505 <at> debbugs.gnu.org
Subject: Re: bug#17505: Pádraig: does this solve your consistency	concern? (was bug#17505: dd statistics output)
Date: Mon, 28 Jul 2014 19:54:22 +0200

On 07/27/14 19:11, Linda Walsh wrote:
> It is more common to specify transfer sizes in SI and mean IEC if you
> are in the US where the digital computer was created.
>
> People in the US have not adopted SI units and many wouldn't know
> a meter from a molehill, so SI units aren't the first thing that
> they are likely to be meaning.  Computer scientists and the industry 
> here,
> grew up with using IEC prefixes where multiples of 8 are already in
> use.  I.e. if you are talking *bytes*, you are using base 2.


I didn't grow up in the US, and grew up with the metric system, but when I'm
talking about memory sizes I always mean IEC (2^10) and never SI (10^3).
The only pitfall here are hard disk sizes where I have to remember that 
"they"
mean SI.


>
> It is inconsistent to switch to decimal prefixes when talking about
> binary numbers.


Agreed.


>
>
>
>> BTW I was playing devil's advocate with my mention of the SIGUSR1 
>> inconsistency.
>> I'm still of the opinion that the dynamic switch of human units based on
>> current transferred amount is the lesser of two evils, since this output
>> is destined for human consumption.


I don't get the reason for the dynamic switch at all. Can somebody 
enlighten me?

regards,
chris

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Tue, 29 Jul 2014 00:18:01 GMT) Full text and rfc822 format available.

Message #80 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Christian Groessler <chris <at> groessler.org>
Cc: 17505 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#17505: Pádraig: does this solve your consistency	concern? (was bug#17505: dd statistics output)
Date: Mon, 28 Jul 2014 17:17:23 -0700


Christian Groessler wrote:
> On 07/27/14 19:11, Linda Walsh wrote:
>> It is more common to specify transfer sizes in SI and mean IEC if you
>> are in the US where the digital computer was created.
>>
>> People in the US have not adopted SI units and many wouldn't know
>> a meter from a molehill, so SI units aren't the first thing that
>> they are likely to be meaning.  Computer scientists and the industry 
>> here,
>> grew up with using IEC prefixes where multiples of 8 are already in
>> use.  I.e. if you are talking *bytes*, you are using base 2.
> 
> 
> I didn't grow up in the US, and grew up with the metric system, but when 
> I'm
> talking about memory sizes I always mean IEC (2^10) and never SI (10^3).
> The only pitfall here are hard disk sizes where I have to remember that 
> "they"
> mean SI.
----
	I was trying to come up with some reason for Padraig's belief
that people usually meant SI when using IEC prefixes for computer
sizes like units bytes (2^3bits) or sectors (2^12 bits)... now what
power of 10 is that?  I've never heard of anyone supporting Padraig
position -- so I assumed it must be some foreign country where the
metric system and metric prefixes are meant to apply to non-unary
and non-base-10 quantities.  Pádraig: where did you get your impression?

	When it comes to disk space -- computers always give it in
IEC -- except where they've bought the line that mixed base-2 and power-of-10
prefixes is a good thing, then they try to get others to buy into such.

But reality is that one can't express disk space as a power of 10 as there
is no multiple of 10 that lines up with a 512-byte multiple.  I.e. the system is
designed to be inaccurate and confuse the issue to make it harder for
consumers to do comparisons.


> I don't get the reason for the dynamic switch at all. Can somebody 
> enlighten me?
----
	I think it was thrown in as a red herring, as I can't think
of any useful case for it.  Having the output vary units randomly, not
at the bequest of the user, doesn't seem especially useful.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sun, 03 Aug 2014 02:08:02 GMT) Full text and rfc822 format available.

Message #83 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Henrik Juul Pedersen <henrikjuul <at> gmail.com>
Cc: Christian Groessler <chris <at> groessler.org>, 17505 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigbrady.com>, Coreutils <coreutils <at> gnu.org>
Subject: Re: bug#17505: dd statistics output
Date: Sat, 02 Aug 2014 19:07:36 -0700

Henrik Juul Pedersen wrote:
> Christian Groessler writes:
>> 268435456 bytes (256 MB) copied, ...
>>
>
> This would be a clear violation of the SI standard, which says on its
> prefixes:
----
I've given this some though and now feel that the SI system is inappropriate
for computer base-2 units (bytes, (2^3 bits), and sectors (2^9 bits)).  It
has never been considered appropriate or intelligent to mix your bases,
but that is exactly what using base10 w/base-2 units is doing.

The SI  system was developed for physical units -- grams, meters, liters,
etc.

If they want to talk a physical quantity, bits would be appropriate.  But
as soon as they use "Byte", they are stepping into *base 2*.  Mixing
base-10 prefixes with a power-of-2 quantity would be bad form in any
scientific paper.  the SI committee telling the computer industry how they
should use 'KB', or MB.. is a bit like them telling the US what prefixes it
should use in front of inches, feet, yards...etc.  Does
putting kilo in front of 'yards', make it an SI unit?  how about
measuring liquids in milliquarts?  or weight in kilopounds.

I think anyone thinking about those examples would say it is insane to mix
power-of-10 prefixes with non-si units -- when was the last time you heard
population expressed in kilopeople or megapeople.

Either never or rarely,  because those are not physical units -- the
area of authority for the SI standard.

Bytes are not physical units, nor are 'sectors'. -- they are logical
amounts based a conceptual grouping of bits.  By using 'Byte' or 'Sector',
one is already using "base-2".  Switching to base-10 for larger prefixes
would be considered bad form in any other area -- yet that is what
the SI folks would foist upon the computer industry.

One *cannot* express disk space, accurately, with base 10 units.  Doing so
is inherently wrong -- and was intended to mislead from the very
beginning.  No disk manufacturer puts out disks where the number of bytes
on the disk is a power of 10.  disk space HAS to be a power of 2 on binary
computers.

Memory doesn't come in multiples of "10" bits or bytes.  It comes in
multiples of 2.  Using binary prefixes with binary units is consistent,
but buying into the propaganda that base10 units should be used with
binary units is just dumb.

>
> "These SI prefixes refer strictly to powers of 10. They should not be used
> to indicate powers of 2 (for example, one kilobit represents 1000 bits and
> not 1024 bits). The IEC has adopted prefixes for binary powers in the
> international standard IEC 60027-2: 2005, third edition, Letter symbols to
> be used in electrical technology – Part 2: Telecommunications and
> electronics. The names and symbols for the prefixes corresponding to 210,
> 220, 230, 240, 250, and 260 are, respectively: kibi, Ki; mebi, Mi; gibi,
> Gi; tebi, Ti; pebi, Pi; and exbi, Ei. Thus, for example, one kibibyte would
> be written: 1 KiB = 210 B = 1024 B, where B denotes a byte. Although these
> prefixes are not part of the SI, they should be used in the field of
> information technology to avoid the incorrect usage of the SI prefixes."
> [1, page 121]
----
   I disagree.  They have no jurisdiction.  They are a foreign entity 
trying
to force confusing units on the computer industry. 

   If they want to use SI prefixes on the singular unit "bit", I'm fine 
with that.
But mixing it with a base-2 unit is confusing, makes calculations 
confusing, and
results in imprecision in specifying binary quantities.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 31 Aug 2014 11:24:03 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 31 Dec 2015 17:36:02 GMT) Full text and rfc822 format available.

Forcibly Merged 17505 22277. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 31 Dec 2015 17:36:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#17505; Package coreutils. (Sat, 02 Jan 2016 00:12:02 GMT) Full text and rfc822 format available.

Message #92 received at 17505 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Mike Fiedler <micfied <at> yandex.com>, 17505 <at> debbugs.gnu.org
Subject: Re: bug#22277: 'dd' - stats are not what expected
Date: Thu, 31 Dec 2015 18:23:16 +0000

[Message part 1 (text/plain, inline)]

On 31/12/15 10:18, Pádraig Brady wrote:
> unarchive 17505
> forcemerge 17505 22277
> stop
> 
> On 31/12/15 01:11, Mike Fiedler wrote:
>>  
>> Hi,
>>  
>> I ran one of my favorite utilities 'dd' again this evening, this time with bs=1G ( IEC ) - I usually do 1M but this time I dealt with more data to be copied...
>> I had to copy about 215 GiB of data from one to another drive ( offset 215 GiB was about the end of the last partition ).
>> So I did:
>>  
>> $  dd if=/dev/sdb of=/dev/sda bs=*1G* count=*222*
>> 222+0 records in
>> 222+0 records out
>> 238370684928 bytes (*238 GB*) copied, 1275.03 s, 187 MB/s
>>
>> When it finished, I got a bit confused, and I asked myself a question if the data I requested did really get copied..  of course it did, but I was not expecting 238 GB to be shown.
>> To make sure I calculated the 512 byte sector end number out of the 238370684928 bytes 'dd' result and compared it with the output of fdisk showing the last sector of the last partition... I was fine.
>>  
>> I think, and many others have a same opinion, 1kB = 1000B, etc, should be banned from use in the IT world, and banned from use by the sales people.
>>  
>> The point is, as you probably noticed, if dd is told to use IEC, let's stick to IEC and not get the results in whatever artificial decimal crap....
>> It can not only confuse, but utility like 'dd' should be 100% specific about handling the units, and there should be not a bit of doubt when it spits out the results.
>> If I would use 1K in this case, I would not notice the difference - my brain is simply too simple, and small, but 1G should at least result in displaying 222 GiB and for sure not GB.
> 
> I have to agree, and this has come up a few times now.
> 
> The number in brackets is not exact and informational for human consumption,
> so we should make an effort to be less confusing.
> There was a proposed patch at:
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=17505#37
> which auto determines the appropriate base from the amount output,
> to output the number with the least amount of info loss.
> 
> There were some issues noted with that,
> but IMHO they were lesser than the current issue.
> 
> We will have to be careful to not corrupt output
> when switching with status=progress (due to possibly shorter status line).
> 
> I'll have another look.

The attached auto sets the units.
For status=progress this is done based on output block size,
for the final transfer stats, it's done based on the transferred byte count.

cheers,
Pádraig.

[dd-stats-units.patch (text/x-patch, attachment)]

Forcibly Merged 17505 22277. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Sat, 02 Jan 2016 00:12:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 30 Jan 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 96 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #17505 Interface inconsistency, use of intelligent defaults.

GNU bug report logs - #17505
Interface inconsistency, use of intelligent defaults.