GNU bug report logs - #18402
Wrong output for single character files without newline

Previous Next

Package: diffutils;

Reported by: Eric Blake <eblake <at> redhat.com>

Date: Wed, 3 Sep 2014 21:05:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18402 in the body.
You can then email your comments to 18402 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Wed, 03 Sep 2014 21:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Blake <eblake <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-diffutils <at> gnu.org. (Wed, 03 Sep 2014 21:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Navin Kabra <navin <at> smriti.com>, bug-gnu-utils <at> gnu.org,
 bug-diffutils <at> gnu.org
Subject: Re: Wrong output for single character files without newline
Date: Wed, 03 Sep 2014 15:03:44 -0600
[Message part 1 (text/plain, inline)]
[adding bug-diffutils, as requested by diff --help]

On 09/03/2014 04:17 AM, Navin Kabra wrote:
> Consider this:
> 
>     echo -n a > /tmp/a
>     echo -n b > /tmp/b
>     diff -B /tmp/a /tmp/b

'echo -n' is non-portable.  Please get used to using 'printf' instead.

> 
> Clearly, the two files are different, yet, diff seems to think that the
> files are identical. I've managed to reproduce this problem on Ubuntu
> 14.04 with diffutils 3.3, on CloudLinux 5.10 with diffutils 2.8.1, and
> also Ubuntu 10.04 with diffutils 2.8.1.
> 
> If I don't use the -B option, the problem goes away. If the files do end
> with a newline, the problem goes away. If the files contain more than 1
> character, the problem goes away. If combined with *some* of the other
> options (e.g. -e or -y) the problem goes away.

Actually, I couldn't reproduce -y making the problem go away:

$ ./src/diff -By <(printf a) <(printf b)
a								b
$ echo $?
0

Thanks for the extensive analysis; I can confirm that this bug is still
present in the latest diffutils.git sources, although I have not
personally hunted for the culprit line of code yet.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 03 Sep 2014 23:07:02 GMT) Full text and rfc822 format available.

Notification sent to Eric Blake <eblake <at> redhat.com>:
bug acknowledged by developer. (Wed, 03 Sep 2014 23:07:02 GMT) Full text and rfc822 format available.

Message #10 received at 18402-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, navin <at> smriti.com, 
 Matt Johnson <mj1856 <at> hotmail.com>, 18402-done <at> debbugs.gnu.org
Subject: Re: [bug-diffutils] bug#18402: Wrong output for single character
 files without newline
Date: Wed, 03 Sep 2014 16:05:55 -0700
[Message part 1 (text/plain, inline)]
Thanks for reporting that.  I installed the attached 3 patches; patch #2 
should fix the bug.
[0001-diff-fix-performance-bug-with-prefix-computation.patch (text/plain, attachment)]
[0002-diff-fix-bug-with-diff-B-and-incomplete-lines.patch (text/plain, attachment)]
[0003-doc-mention-diff-B-fix-in-NEWS.patch (text/plain, attachment)]

Information forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Wed, 03 Sep 2014 23:21:01 GMT) Full text and rfc822 format available.

Message #13 received at 18402 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 18402 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, 
 Eric Blake <eblake <at> redhat.com>
Cc: 18402-done <at> debbugs.gnu.org, Matt Johnson <mj1856 <at> hotmail.com>,
 navin <at> smriti.com
Subject: Re: [bug-diffutils] bug#18402: bug#18402: Wrong output for single
 character files without newline
Date: Wed, 3 Sep 2014 16:20:06 -0700
On Wed, Sep 3, 2014 at 4:05 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Thanks for reporting that.  I installed the attached 3 patches; patch #2
> should fix the bug.

Thanks for all the patches.
Regarding the performance fix, can you give performance deltas on moderate
or pathologically affected inputs?

It'd be great to include actual inputs (or a recipe for creating them)
so we have
a hope of avoiding such regressions in the future.




Information forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Wed, 03 Sep 2014 23:21:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Thu, 04 Sep 2014 00:21:02 GMT) Full text and rfc822 format available.

Message #19 received at 18402 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, 18402 <at> debbugs.gnu.org
Subject: Re: [bug-diffutils] bug#18402: bug#18402: Wrong output for single
 character files without newline
Date: Wed, 03 Sep 2014 17:20:08 -0700
Jim Meyering wrote:
> can you give performance deltas on moderate
> or pathologically affected inputs?

Maybe something like this:

diff --horizon-lines=100000000000000000000 gnulib/ChangeLog /tmp/ChangeLog

where the two files are copies.  The bug fix sped up performance about 
5x on my platform, which is Fedora 20 x86-64, AMD Phenom II X4 910e.




Information forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Thu, 04 Sep 2014 05:13:02 GMT) Full text and rfc822 format available.

Message #22 received at 18402 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18402 <18402 <at> debbugs.gnu.org>
Subject: Re: [bug-diffutils] bug#18402: bug#18402: Wrong output for single
 character files without newline
Date: Wed, 3 Sep 2014 22:12:04 -0700
On Wed, Sep 3, 2014 at 5:20 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> can you give performance deltas on moderate
>> or pathologically affected inputs?
>
>
> Maybe something like this:
>
> diff --horizon-lines=100000000000000000000 gnulib/ChangeLog /tmp/ChangeLog
>
> where the two files are copies.  The bug fix sped up performance about 5x on
> my platform, which is Fedora 20 x86-64, AMD Phenom II X4 910e.

Thanks for the details.  I tried to reproduce using two copies of
gnulib/ChangeLog, but saw identical times for before/after runs.
I also tried with two copies of the output of "seq 9999999" on a
tmpfs file system, with the same result: no discernible difference.
I tried both on an AMD FX(tm)-4100 and an Intel(R) Core(TM) i7-4770S
Here are the commands I ran:

seq 9999999 > /t/1 && cp /t/2
env time src/diff --horizon-lines=100000000000000000000 /t/[12]

Then I took the best of five elapsed times and compared.
Here's the minimum time on the faster system, both with and without the patch:

$ env time src/diff --horizon-lines=100000000000000000000 /t/[12]
1.94user 0.34system 0:02.29elapsed 99%CPU (0avgtext+0avgdata
1112960maxresident)k
0inputs+0outputs (0major+404031minor)pagefaults 0swaps




Information forwarded to bug-diffutils <at> gnu.org:
bug#18402; Package diffutils. (Fri, 05 Sep 2014 00:35:01 GMT) Full text and rfc822 format available.

Message #25 received at 18402 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18402 <18402 <at> debbugs.gnu.org>
Subject: Re: [bug-diffutils] bug#18402: bug#18402: Wrong output for single
 character files without newline
Date: Thu, 04 Sep 2014 17:34:15 -0700
Jim Meyering wrote:
> I also tried with two copies of the output of "seq 9999999" on a
> tmpfs file system, with the same result: no discernible difference.

There is something weird going on, as I can't reproduce my earlier 
results.  Perhaps I built one version of 'diff' without optimization and 
the other one with it, by accident.  Sorry about sending you down a wild 
goose chase.

I'm still seeing a significant performance improvement due to the 
change, though not as dramatic as what I earlier reported.  Here's the 
benchmark:

$ seq 100000000 >0
$ cp 0 1
$ time ./diff-old 0 1

real    0m2.540s
user    0m1.055s
sys     0m1.464s
$ time ./diff-new 0 1

real    0m1.734s
user    0m0.256s
sys     0m1.463s

where 'diff-old' and 'diff-new' are the old 
(b6e691277288c4e8d53b1d2577137d265008d13e) and current 
(df3af29627a92495a740da13cb8bb0d4fcc1bf84) versions of diffutils, both 
compiled with plain 'configure; make' on the same Fedora 20 x86-64 
platform I mentioned earlier.  This is on an ext4 file system that is 
built atop a mirrored hard-disk subsystem, and the locale is en_US.utf8 
(dunno if any of this matters).

This benchmark is dominated by system CPU time, so the new version is 
only about 45% faster than the old if one looks at real time, but it's 
still clearly a win as the user CPU time about 4x faster.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 03 Oct 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 200 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.