GNU bug report logs - #23073
wc reports wrong byte counts when using '--from-files0=-'

Previous Next

Package: coreutils;

Reported by: "William R. Fraser" <wfraser <at> codewise.org>

Date: Mon, 21 Mar 2016 01:29:01 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23073 in the body.
You can then email your comments to 23073 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 21 Mar 2016 01:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "William R. Fraser" <wfraser <at> codewise.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 21 Mar 2016 01:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "William R. Fraser" <wfraser <at> codewise.org>
To: bug-coreutils <at> gnu.org
Subject: wc reports wrong byte counts when using '--from-files0=-'
Date: Sun, 20 Mar 2016 17:59:57 -0700 (PDT)
[Message part 1 (text/plain, inline)]
When wc gets its list of files by reading from stdin, using the argument 
'--from-files0=-', it reuses the same fstatus struct for each file.

The problem is that the 'wc' function checks the 'failed' member of this 
struct and if it is <=0, it skips doing fstat on the file. The main loop 
doesn't reset this value between files, so only the first file has fstat 
done on it.

This can result in the 'wc' function seeking past the end of 
subsequent files and then over-reporting their byte counts.

See the attached patch, which resets the fstatus struct in between files 
when reading the file list from stdin.

Thanks,
- Bill Fraser
[0001-wc-fix-wrong-byte-counts-when-using-files-from0.patch (text/plain, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 21 Mar 2016 15:17:01 GMT) Full text and rfc822 format available.

Message #8 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: "William R. Fraser" <wfraser <at> codewise.org>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Mon, 21 Mar 2016 15:16:35 +0000
On 21/03/16 00:59, William R. Fraser wrote:
> When wc gets its list of files by reading from stdin, using the argument
> '--from-files0=-', it reuses the same fstatus struct for each file.
>
> The problem is that the 'wc' function checks the 'failed' member of this
> struct and if it is <=0, it skips doing fstat on the file. The main loop
> doesn't reset this value between files, so only the first file has fstat
> done on it.
>
> This can result in the 'wc' function seeking past the end of
> subsequent files and then over-reporting their byte counts.
>
> See the attached patch, which resets the fstatus struct in between files
> when reading the file list from stdin.

Ouch. This seems to be since v7.0-96-gc2e56e0
It would also mean there would be a lot of redundant reading
if the initial file was significantly smaller than any other file.

$ truncate -s1G wc.big
$ touch wc.small
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
1073741824 wc.big
1073741760 wc.small
2147483584 total

We'll submit a full patch in your name.

thanks!
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 21 Mar 2016 17:21:02 GMT) Full text and rfc822 format available.

Message #11 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Pádraig Brady <P <at> draigBrady.com>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Mon, 21 Mar 2016 18:20:25 +0100
On 03/21/2016 04:16 PM, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
>
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
>
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total
>
> We'll submit a full patch in your name.

Interesting enough, there seems to be a threshold to trigger the bug:

$ touch wc.small

$ seq 1000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
3893 wc.big
0 wc.small
3893 total

$ seq 10000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
48894 wc.big
45067 wc.small
93961 total

That's why I couldn't reproduce it this morning.

Have a nice day,
Berny




Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 21 Mar 2016 18:45:02 GMT) Full text and rfc822 format available.

Message #14 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "William R. Fraser" <wfraser <at> codewise.org>
Cc: 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Mon, 21 Mar 2016 11:44:06 -0700
On Sun, Mar 20, 2016 at 5:59 PM, William R. Fraser <wfraser <at> codewise.org> wrote:
> When wc gets its list of files by reading from stdin, using the argument
> '--from-files0=-', it reuses the same fstatus struct for each file.
>
> The problem is that the 'wc' function checks the 'failed' member of this
> struct and if it is <=0, it skips doing fstat on the file. The main loop
> doesn't reset this value between files, so only the first file has fstat
> done on it.
>
> This can result in the 'wc' function seeking past the end of subsequent
> files and then over-reporting their byte counts.
>
> See the attached patch, which resets the fstatus struct in between files
> when reading the file list from stdin.

Thank you for the patch and report for what looks like
a bug in code I wrote.




Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 19 Dec 2016 19:01:01 GMT) Full text and rfc822 format available.

Message #17 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: "William R. Fraser" <wfraser <at> codewise.org>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Mon, 19 Dec 2016 19:00:31 +0000
[Message part 1 (text/plain, inline)]
On 21/03/16 15:16, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
> 
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
> 
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total

Sorry for the delay.
I didn't go far enough back in my TODO list so missed this.
Proposed patch attached.

thanks,
Pádraig

[wc-c--files0-from-fix.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Mon, 19 Dec 2016 19:53:01 GMT) Full text and rfc822 format available.

Message #20 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: "William R. Fraser" <wfraser <at> codewise.org>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Mon, 19 Dec 2016 11:52:18 -0800
[Message part 1 (text/plain, inline)]
Looks good :)

On Mon, Dec 19, 2016 at 11:00 AM, Pádraig Brady <P <at> draigbrady.com> wrote:

> On 21/03/16 15:16, Pádraig Brady wrote:
> > On 21/03/16 00:59, William R. Fraser wrote:
> >> When wc gets its list of files by reading from stdin, using the argument
> >> '--from-files0=-', it reuses the same fstatus struct for each file.
> >>
> >> The problem is that the 'wc' function checks the 'failed' member of this
> >> struct and if it is <=0, it skips doing fstat on the file. The main loop
> >> doesn't reset this value between files, so only the first file has fstat
> >> done on it.
> >>
> >> This can result in the 'wc' function seeking past the end of
> >> subsequent files and then over-reporting their byte counts.
> >>
> >> See the attached patch, which resets the fstatus struct in between files
> >> when reading the file list from stdin.
> >
> > Ouch. This seems to be since v7.0-96-gc2e56e0
> > It would also mean there would be a lot of redundant reading
> > if the initial file was significantly smaller than any other file.
> >
> > $ truncate -s1G wc.big
> > $ touch wc.small
> > $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> > 1073741824 wc.big
> > 1073741760 wc.small
> > 2147483584 total
>
> Sorry for the delay.
> I didn't go far enough back in my TODO list so missed this.
> Proposed patch attached.
>
> thanks,
> Pádraig
>
>


-- 
Bill Fraser { http://www.codewise.org/wrf }
<http://www.cs.uic.edu/%7Ewfraser/>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Tue, 20 Dec 2016 01:51:02 GMT) Full text and rfc822 format available.

Message #23 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Pádraig Brady <P <at> draigBrady.com>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Tue, 20 Dec 2016 02:50:20 +0100
On 12/19/2016 08:00 PM, Pádraig Brady wrote:
> +  [bug introduced in coreutils-7.1]

FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
I had a working 8.23 on a system here, so I took the time to search deeper.
I found the reason to be the wrong value of the 'hi_pos' parameter passed
to lseek():

  open("wc.big", O_RDONLY)                = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
  lseek(3, 1073741824, SEEK_CUR)          = 1073741824
  read(3, "", 16384)                      = 0
  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
  write(1, "1073741824 wc.big\n", 181073741824 wc.big)     = 18
  close(3)                                = 0
  open("wc.small", O_RDONLY)              = 3
  lseek(3, 1073741824, SEEK_CUR)          = 1073741824
  read(3, "", 16384)                      = 0
  write(1, "1073741824 wc.small\n", 201073741824 wc.small

This took me directly to v8.23-47-g2662702 (which a quick test
against v8.23-47-g2662702^ confirmed).

Therefore, I think it's worth to do the following amendment:

-  [bug introduced in coreutils-7.1]
+  [bug introduced in coreutils-8.24]

Thanks & have a nice day,
Berny




Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Tue, 20 Dec 2016 13:13:02 GMT) Full text and rfc822 format available.

Message #26 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Bernhard Voelker <mail <at> bernhard-voelker.de>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Tue, 20 Dec 2016 13:12:02 +0000
On 20/12/16 01:50, Bernhard Voelker wrote:
> On 12/19/2016 08:00 PM, Pádraig Brady wrote:
>> +  [bug introduced in coreutils-7.1]
> 
> FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
> I had a working 8.23 on a system here, so I took the time to search deeper.
> I found the reason to be the wrong value of the 'hi_pos' parameter passed
> to lseek():
> 
>   open("wc.big", O_RDONLY)                = 3
>   fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
>   lseek(3, 1073741824, SEEK_CUR)          = 1073741824
>   read(3, "", 16384)                      = 0
>   fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
>   write(1, "1073741824 wc.big\n", 181073741824 wc.big)     = 18
>   close(3)                                = 0
>   open("wc.small", O_RDONLY)              = 3
>   lseek(3, 1073741824, SEEK_CUR)          = 1073741824
>   read(3, "", 16384)                      = 0
>   write(1, "1073741824 wc.small\n", 201073741824 wc.small
> 
> This took me directly to v8.23-47-g2662702 (which a quick test
> against v8.23-47-g2662702^ confirmed).
> 
> Therefore, I think it's worth to do the following amendment:
> 
> -  [bug introduced in coreutils-7.1]
> +  [bug introduced in coreutils-8.24]

Right!

While st_size would have been incorrect for subsequent
files since v7.1, it was only used since v8.24.

Fixed with:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#23073; Package coreutils. (Tue, 20 Dec 2016 13:35:02 GMT) Full text and rfc822 format available.

Message #29 received at 23073 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Pádraig Brady <P <at> draigBrady.com>, 23073 <at> debbugs.gnu.org
Subject: Re: bug#23073: wc reports wrong byte counts when using
 '--from-files0=-'
Date: Tue, 20 Dec 2016 14:34:33 +0100
On 12/20/2016 02:12 PM, Pádraig Brady wrote:
> Right!
> 
> While st_size would have been incorrect for subsequent
> files since v7.1, it was only used since v8.24.
> 
> Fixed with:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68

Thanks!

Have a nice day,
Berny




Added tag(s) fixed. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 25 Oct 2018 16:12:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 23073 <at> debbugs.gnu.org and "William R. Fraser" <wfraser <at> codewise.org> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 25 Oct 2018 16:12:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Nov 2018 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 155 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.