GNU bug report logs -
#23073
wc reports wrong byte counts when using '--from-files0=-'
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23073 in the body.
You can then email your comments to 23073 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 21 Mar 2016 01:29:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"William R. Fraser" <wfraser <at> codewise.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 21 Mar 2016 01:29:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
When wc gets its list of files by reading from stdin, using the argument
'--from-files0=-', it reuses the same fstatus struct for each file.
The problem is that the 'wc' function checks the 'failed' member of this
struct and if it is <=0, it skips doing fstat on the file. The main loop
doesn't reset this value between files, so only the first file has fstat
done on it.
This can result in the 'wc' function seeking past the end of
subsequent files and then over-reporting their byte counts.
See the attached patch, which resets the fstatus struct in between files
when reading the file list from stdin.
Thanks,
- Bill Fraser
[0001-wc-fix-wrong-byte-counts-when-using-files-from0.patch (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 21 Mar 2016 15:17:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On 21/03/16 00:59, William R. Fraser wrote:
> When wc gets its list of files by reading from stdin, using the argument
> '--from-files0=-', it reuses the same fstatus struct for each file.
>
> The problem is that the 'wc' function checks the 'failed' member of this
> struct and if it is <=0, it skips doing fstat on the file. The main loop
> doesn't reset this value between files, so only the first file has fstat
> done on it.
>
> This can result in the 'wc' function seeking past the end of
> subsequent files and then over-reporting their byte counts.
>
> See the attached patch, which resets the fstatus struct in between files
> when reading the file list from stdin.
Ouch. This seems to be since v7.0-96-gc2e56e0
It would also mean there would be a lot of redundant reading
if the initial file was significantly smaller than any other file.
$ truncate -s1G wc.big
$ touch wc.small
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
1073741824 wc.big
1073741760 wc.small
2147483584 total
We'll submit a full patch in your name.
thanks!
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 21 Mar 2016 17:21:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On 03/21/2016 04:16 PM, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
>
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
>
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total
>
> We'll submit a full patch in your name.
Interesting enough, there seems to be a threshold to trigger the bug:
$ touch wc.small
$ seq 1000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
3893 wc.big
0 wc.small
3893 total
$ seq 10000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
48894 wc.big
45067 wc.small
93961 total
That's why I couldn't reproduce it this morning.
Have a nice day,
Berny
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 21 Mar 2016 18:45:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On Sun, Mar 20, 2016 at 5:59 PM, William R. Fraser <wfraser <at> codewise.org> wrote:
> When wc gets its list of files by reading from stdin, using the argument
> '--from-files0=-', it reuses the same fstatus struct for each file.
>
> The problem is that the 'wc' function checks the 'failed' member of this
> struct and if it is <=0, it skips doing fstat on the file. The main loop
> doesn't reset this value between files, so only the first file has fstat
> done on it.
>
> This can result in the 'wc' function seeking past the end of subsequent
> files and then over-reporting their byte counts.
>
> See the attached patch, which resets the fstatus struct in between files
> when reading the file list from stdin.
Thank you for the patch and report for what looks like
a bug in code I wrote.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 19 Dec 2016 19:01:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 23073 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 21/03/16 15:16, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
>
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
>
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total
Sorry for the delay.
I didn't go far enough back in my TODO list so missed this.
Proposed patch attached.
thanks,
Pádraig
[wc-c--files0-from-fix.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Mon, 19 Dec 2016 19:53:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 23073 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Looks good :)
On Mon, Dec 19, 2016 at 11:00 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 21/03/16 15:16, Pádraig Brady wrote:
> > On 21/03/16 00:59, William R. Fraser wrote:
> >> When wc gets its list of files by reading from stdin, using the argument
> >> '--from-files0=-', it reuses the same fstatus struct for each file.
> >>
> >> The problem is that the 'wc' function checks the 'failed' member of this
> >> struct and if it is <=0, it skips doing fstat on the file. The main loop
> >> doesn't reset this value between files, so only the first file has fstat
> >> done on it.
> >>
> >> This can result in the 'wc' function seeking past the end of
> >> subsequent files and then over-reporting their byte counts.
> >>
> >> See the attached patch, which resets the fstatus struct in between files
> >> when reading the file list from stdin.
> >
> > Ouch. This seems to be since v7.0-96-gc2e56e0
> > It would also mean there would be a lot of redundant reading
> > if the initial file was significantly smaller than any other file.
> >
> > $ truncate -s1G wc.big
> > $ touch wc.small
> > $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> > 1073741824 wc.big
> > 1073741760 wc.small
> > 2147483584 total
>
> Sorry for the delay.
> I didn't go far enough back in my TODO list so missed this.
> Proposed patch attached.
>
> thanks,
> Pádraig
>
>
--
Bill Fraser { http://www.codewise.org/wrf }
<http://www.cs.uic.edu/%7Ewfraser/>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Tue, 20 Dec 2016 01:51:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On 12/19/2016 08:00 PM, Pádraig Brady wrote:
> + [bug introduced in coreutils-7.1]
FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
I had a working 8.23 on a system here, so I took the time to search deeper.
I found the reason to be the wrong value of the 'hi_pos' parameter passed
to lseek():
open("wc.big", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
lseek(3, 1073741824, SEEK_CUR) = 1073741824
read(3, "", 16384) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
write(1, "1073741824 wc.big\n", 181073741824 wc.big) = 18
close(3) = 0
open("wc.small", O_RDONLY) = 3
lseek(3, 1073741824, SEEK_CUR) = 1073741824
read(3, "", 16384) = 0
write(1, "1073741824 wc.small\n", 201073741824 wc.small
This took me directly to v8.23-47-g2662702 (which a quick test
against v8.23-47-g2662702^ confirmed).
Therefore, I think it's worth to do the following amendment:
- [bug introduced in coreutils-7.1]
+ [bug introduced in coreutils-8.24]
Thanks & have a nice day,
Berny
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Tue, 20 Dec 2016 13:13:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On 20/12/16 01:50, Bernhard Voelker wrote:
> On 12/19/2016 08:00 PM, Pádraig Brady wrote:
>> + [bug introduced in coreutils-7.1]
>
> FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
> I had a working 8.23 on a system here, so I took the time to search deeper.
> I found the reason to be the wrong value of the 'hi_pos' parameter passed
> to lseek():
>
> open("wc.big", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
> lseek(3, 1073741824, SEEK_CUR) = 1073741824
> read(3, "", 16384) = 0
> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
> write(1, "1073741824 wc.big\n", 181073741824 wc.big) = 18
> close(3) = 0
> open("wc.small", O_RDONLY) = 3
> lseek(3, 1073741824, SEEK_CUR) = 1073741824
> read(3, "", 16384) = 0
> write(1, "1073741824 wc.small\n", 201073741824 wc.small
>
> This took me directly to v8.23-47-g2662702 (which a quick test
> against v8.23-47-g2662702^ confirmed).
>
> Therefore, I think it's worth to do the following amendment:
>
> - [bug introduced in coreutils-7.1]
> + [bug introduced in coreutils-8.24]
Right!
While st_size would have been incorrect for subsequent
files since v7.1, it was only used since v8.24.
Fixed with:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68
thanks,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#23073
; Package
coreutils
.
(Tue, 20 Dec 2016 13:35:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 23073 <at> debbugs.gnu.org (full text, mbox):
On 12/20/2016 02:12 PM, Pádraig Brady wrote:
> Right!
>
> While st_size would have been incorrect for subsequent
> files since v7.1, it was only used since v8.24.
>
> Fixed with:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68
Thanks!
Have a nice day,
Berny
Added tag(s) fixed.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 25 Oct 2018 16:12:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
23073 <at> debbugs.gnu.org and "William R. Fraser" <wfraser <at> codewise.org>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 25 Oct 2018 16:12:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 23 Nov 2018 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 155 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.