GNU bug report logs - #18168
Bug in "sort -V" ?

Previous Next

Package: coreutils;

Reported by: "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>

Date: Fri, 1 Aug 2014 14:44:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18168 in the body.
You can then email your comments to 18168 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#18168; Package coreutils. (Fri, 01 Aug 2014 14:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 01 Aug 2014 14:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>
To: bug-coreutils <at> gnu.org
Subject: Bug in "sort -V" ?
Date: Fri, 1 Aug 2014 11:38:39 +0200 (CEST)
Hi,

I am not sure if it's a bug or not but for my application cases the "sort" 
command with use of the very helpful option "-V" (natural sort of 
(version) numbers within text) not always delivers the by me expected 
output.

Example input file (with four test cases):

1.0.5_src.tar.gz
1.0_src.tar.gz
2.0.5src.tar.gz
2.0src.tar.gz
3.0.5/
3.0/
4.0.5beta/
4.0beta/

Sorted ("sort -V") output file (with errors?):

1.0.5_src.tar.gz
1.0_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0.5/
3.0/
4.0beta/
4.0.5beta/

By me expected output file:

1.0_src.tar.gz
1.0.5_src.tar.gz
2.0src.tar.gz
2.0.5src.tar.gz
3.0/
3.0.5/
4.0beta/
4.0.5beta/

You see that the sort works correctly if after the [0-9\.]* part follows
a alphabetic character but not if follows a non-alphabetic character like 
a slash or an underscore.

Regards

Jens




Information forwarded to bug-coreutils <at> gnu.org:
bug#18168; Package coreutils. (Tue, 06 Nov 2018 18:49:02 GMT) Full text and rfc822 format available.

Message #8 received at 18168 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>, 18168 <at> debbugs.gnu.org
Subject: Re: bug#18168: Bug in "sort -V" ?
Date: Tue, 6 Nov 2018 11:48:07 -0700
tags 18168 notabug
close 18168
stop

(triaging old bugs)

Hello,

It seems your message was lost and not replied to in 4 years.
Sorry about that.

On 2014-08-01 3:38 a.m., Schleusener, Jens wrote:
> I am not sure if it's a bug or not but for my application cases the 
> "sort" command with use of the very helpful option "-V" (natural sort of 
> (version) numbers within text) not always delivers the by me expected 
> output.

Note that "-V/--version" is specifically sorting by Debian's *version*
sorting rules. It might seem like it's the same as "natural sort", but
it is not.

The exact rules are here:
https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
https://readme.phys.ethz.ch/documentation/debian_version_numbers/

> 
> Example input file (with four test cases):
> 
> 1.0.5_src.tar.gz
> 1.0_src.tar.gz
> 2.0.5src.tar.gz
> 2.0src.tar.gz
> 3.0.5/
> 3.0/
> 4.0.5beta/
> 4.0beta/
> 
> Sorted ("sort -V") output file (with errors?):
> 
> 1.0.5_src.tar.gz
> 1.0_src.tar.gz
> 2.0src.tar.gz
> 2.0.5src.tar.gz
> 3.0.5/
> 3.0/
> 4.0beta/
> 4.0.5beta/
> 
> By me expected output file:
> 
> 1.0_src.tar.gz
> 1.0.5_src.tar.gz
> 2.0src.tar.gz
> 2.0.5src.tar.gz
> 3.0/
> 3.0.5/
> 4.0beta/
> 4.0.5beta/

The disagreement is about "1.0_src.tar.gz" vs "1.0.5_src.tar.gz"
and "3.0/" vs "3.0.5/" .

Note that these characters are not strictly valid characters in debian
version strings.

Let's try to compare them using Debian's own tools:

First, define a tiny shell function to help compare strings:

    compver() {
       dpkg --compare-versions "$1" lt "$2" \
            && printf "%s\n" "$1" "$2" \
            || printf "%s\n" "$2" "$1"
    }

Then, compare the values:

  $ compver 1.0.5_src.tar.gz 1.0_src.tar.gz
  dpkg: warning: version '1.0.5_src.tar.gz' has bad syntax: invalid 
character in version number
  dpkg: warning: version '1.0_src.tar.gz' has bad syntax: invalid 
character in version number
  1.0.5_src.tar.gz
  1.0_src.tar.gz

  $ compver 3.0/ 3.0.5/
  dpkg: warning: version '3.0/' has bad syntax: invalid character in 
version number
  dpkg: warning: version '3.0.5/' has bad syntax: invalid character in 
version number
  3.0.5/
  3.0/

So sort's order agrees with Debian's ordering rules.
It might not be what a "natural sort" algorithm would do, but version-sort
is not exactly natural-sort.

Another detailed example of a version-sort is here:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22275 



As such, I'm closing this bug.
Discussion can continue by replying to this thread.

-assaf




Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 06 Nov 2018 18:49:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 18168 <at> debbugs.gnu.org and "Schleusener, Jens" <Jens.Schleusener <at> t-online.de> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 06 Nov 2018 18:49:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#18168; Package coreutils. (Tue, 20 Nov 2018 23:34:01 GMT) Full text and rfc822 format available.

Message #15 received at 18168 <at> debbugs.gnu.org (full text, mbox):

From: L A Walsh <coreutils <at> tlinx.org>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>, 18168 <at> debbugs.gnu.org
Subject: Re: bug#18168: Bug in "sort -V" ?
Date: Tue, 20 Nov 2018 15:33:51 -0800
On 11/6/2018 10:48 AM, Assaf Gordon wrote:
> On 2014-08-01 3:38 a.m., Schleusener, Jens wrote:
>   
>> I am not sure if it's a bug or not but for my application cases the 
>> "sort" command with use of the very helpful option "-V" (natural sort of 
>> (version) numbers within text) not always delivers the by me expected 
>> output.
>>     
>
> Note that "-V/--version" is specifically sorting by Debian's *version*
> sorting rules. It might seem like it's the same as "natural sort", but
> it is not.
> The exact rules are here:
> https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
> https://readme.phys.ethz.ch/documentation/debian_version_numbers/
>   
>> Example input file (with four test cases):
>> 1.0.5_src.tar.gz
>> 1.0_src.tar.gz
>> 2.0.5src.tar.gz
>> 2.0src.tar.gz
>> 3.0.5/
>> 3.0/
>> 4.0.5beta/
>> 4.0beta/
>>
>> Sorted ("sort -V") output file (with errors?):
>> 1.0.5_src.tar.gz
>> 1.0_src.tar.gz
>> 2.0src.tar.gz
>> 2.0.5src.tar.gz
>> 3.0.5/
>> 3.0/
>> 4.0beta/
>> 4.0.5beta/
>>
>> By me expected output file:
>> 1.0_src.tar.gz
>> 1.0.5_src.tar.gz
>> 2.0src.tar.gz
>> 2.0.5src.tar.gz
>> 3.0/
>> 3.0.5/
>> 4.0beta/
>> 4.0.5beta/
>>     
>
> The disagreement is about "1.0_src.tar.gz" vs "1.0.5_src.tar.gz"
> and "3.0/" vs "3.0.5/" .
>
> Note that these characters are not strictly valid characters in debian
> version strings.
>   
---
   I too would disagree with the above ordering.

   This bug had me go and look at 2 places where I compared version
strings (I compared 2 algorithms) using the above as input, but removing
the '/' which really shouldn't be part of the version string as it looks 
like
output from ls (though I probably should add that case in my torture 
testing).
My 2nd algorithm looks like I looked at sources from rpm probably 
derived from
some debian order.

   My first algorithm I could justify as right or wrong gives the original
posters expected order, but the 2nd(likely deb) gives the deb order -- 
almost.
The addition of the '/' chars changes the sort order.  Even that points 
to the
assertion that "it shouldn't".

I.e. in the
3.0 v. 3.0.5, the latter comes out 'greater' in deb rules rules (and mine)

**BUT**

3.0/ v. 3.0.5/ and
3.0_ v. 3.0.5_  don't sort as might be expected, though these:

3.0- v. 3.0.5-
3.0() v 3.0.5()
3.0a v. 3.0.5a

show the 2nd expr as greater.  I am thinking such inconsistencies are a bit
odd in a Version-sort, especially for a determinant tool?


> Let's try to compare them using Debian's own tools:
>
> First, define a tiny shell function to help compare strings:
>
>      compver() {
>         dpkg --compare-versions "$1" lt "$2" \
>              && printf "%s\n" "$1" "$2" \
>              || printf "%s\n" "$2" "$1"
>      }
>
> Then, compare the values:
>
>    $ compver 1.0.5_src.tar.gz 1.0_src.tar.gz
>    dpkg: warning: version '1.0.5_src.tar.gz' has bad syntax: invalid 
> character in version number
>    dpkg: warning: version '1.0_src.tar.gz' has bad syntax: invalid 
> character in version number
>    1.0.5_src.tar.gz
>    1.0_src.tar.gz
>
>    $ compver 3.0/ 3.0.5/
>    dpkg: warning: version '3.0/' has bad syntax: invalid character in 
> version number
>    dpkg: warning: version '3.0.5/' has bad syntax: invalid character in 
> version number
>    3.0.5/
>    3.0/
>
> So sort's order agrees with Debian's ordering rules.
>   
---
   One might consider an error, to mean "indeterminant".

   Especially -- it should be the case that the tool sort documents how it
sort(s) work within its manpages. -- hyperlinks to outside sources 
doesn't usually cut it for this type of program (console based -- _most_ 
consoles
don't support hyperlinks).










Information forwarded to bug-coreutils <at> gnu.org:
bug#18168; Package coreutils. (Wed, 21 Nov 2018 18:13:01 GMT) Full text and rfc822 format available.

Message #18 received at 18168 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: L A Walsh <coreutils <at> tlinx.org>, Assaf Gordon <assafgordon <at> gmail.com>
Cc: "Schleusener, Jens" <Jens.Schleusener <at> t-online.de>, 18168 <at> debbugs.gnu.org
Subject: Re: bug#18168: Bug in "sort -V" ?
Date: Wed, 21 Nov 2018 10:12:18 -0800
I can't see us disagreeing with Debian. Perhaps you can file a bug report with 
Debian and get them to switch to the algorithm you prefer.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 20 Dec 2018 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.