GNU bug report logs - #5832
Feature request: uniq -k

Previous Next

Package: coreutils;

Reported by: Raphael Clifford <drraph <at> gmail.com>

Date: Sat, 3 Apr 2010 18:50:03 UTC

Severity: wishlist

To reply to this bug, email your comments to 5832 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Sat, 03 Apr 2010 18:50:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Raphael Clifford <drraph <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 03 Apr 2010 18:50:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Raphael Clifford <drraph <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Feature request: uniq -k
Date: Sat, 3 Apr 2010 19:39:14 +0100
Please excuse the cross-post but I have been told this is the
appropriate place to place a feature request.

Is it possible to make a feature request for uniq to add the "-k"
option to specify
fields?  Interestingly uniq already has such things as

-f, --skip-fields=N
             avoid comparing the first N fields
and

 -s, --skip-chars=N
             avoid comparing the first N characters

but no explicit option to specify which fields should be considered
when doing the comparison.  This would be very useful, for example,
when removing duplicates from time series data (where you are only
worried about consecutive duplicates on certain fields).  The awk
equivalent would be something like

awk '$2$3$4$5 !=  p; {p=$2$3$4$5}'

for using fields 2 to 5 as comparators.

Raphael

P.S. http://www.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
is the posix specification for uniq if that is of any interest.
Curiously it says nothing about which duplicate line to keep when you
don't consider all fields in the comparison.





Severity set to 'wishlist' from 'normal' Request was from bob <at> proulx.com (Bob Proulx) to control <at> debbugs.gnu.org. (Sat, 03 Apr 2010 21:43:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Sun, 04 Apr 2010 14:31:03 GMT) Full text and rfc822 format available.

Message #10 received at 5832 <at> debbugs.gnu.org (full text, mbox):

From: Steve Ward <planet36 <at> gmail.com>
To: Raphael Clifford <drraph <at> gmail.com>
Cc: 5832 <at> debbugs.gnu.org
Subject: Re: bug#5832: Feature request: uniq -k
Date: Sun, 4 Apr 2010 00:22:31 -0400
[Message part 1 (text/plain, inline)]
This might be relevant:

uniq: missing option -W / --check-fields=N
http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00168.html



Steve


On Sat, Apr 3, 2010 at 14:39, Raphael Clifford <drraph <at> gmail.com> wrote:

> Please excuse the cross-post but I have been told this is the
> appropriate place to place a feature request.
>
> Is it possible to make a feature request for uniq to add the "-k"
> option to specify
> fields?  Interestingly uniq already has such things as
>
> -f, --skip-fields=N
>             avoid comparing the first N fields
> and
>
>  -s, --skip-chars=N
>             avoid comparing the first N characters
>
> but no explicit option to specify which fields should be considered
> when doing the comparison.  This would be very useful, for example,
> when removing duplicates from time series data (where you are only
> worried about consecutive duplicates on certain fields).  The awk
> equivalent would be something like
>
> awk '$2$3$4$5 !=  p; {p=$2$3$4$5}'
>
> for using fields 2 to 5 as comparators.
>
> Raphael
>
> P.S. http://www.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
> is the posix specification for uniq if that is of any interest.
> Curiously it says nothing about which duplicate line to keep when you
> don't consider all fields in the comparison.
>
>
>
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Mon, 05 Apr 2010 09:39:02 GMT) Full text and rfc822 format available.

Message #13 received at 5832 <at> debbugs.gnu.org (full text, mbox):

From: Raphael Clifford <drraph <at> gmail.com>
To: Steve Ward <planet36 <at> gmail.com>
Cc: 5832 <at> debbugs.gnu.org
Subject: Re: bug#5832: Feature request: uniq -k
Date: Mon, 5 Apr 2010 10:10:46 +0100
Yes http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
in particular is pretty much exactly the same feature request.

What is the current thinking on this?

Raphael

On 4 April 2010 05:22, Steve Ward <planet36 <at> gmail.com> wrote:
> This might be relevant:
>
> uniq: missing option -W / --check-fields=N
> http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00168.html
>
>
>
> Steve
>
>
> On Sat, Apr 3, 2010 at 14:39, Raphael Clifford <drraph <at> gmail.com> wrote:
>>
>> Please excuse the cross-post but I have been told this is the
>> appropriate place to place a feature request.
>>
>> Is it possible to make a feature request for uniq to add the "-k"
>> option to specify
>> fields?  Interestingly uniq already has such things as
>>
>> -f, --skip-fields=N
>>             avoid comparing the first N fields
>> and
>>
>>  -s, --skip-chars=N
>>             avoid comparing the first N characters
>>
>> but no explicit option to specify which fields should be considered
>> when doing the comparison.  This would be very useful, for example,
>> when removing duplicates from time series data (where you are only
>> worried about consecutive duplicates on certain fields).  The awk
>> equivalent would be something like
>>
>> awk '$2$3$4$5 !=  p; {p=$2$3$4$5}'
>>
>> for using fields 2 to 5 as comparators.
>>
>> Raphael
>>
>> P.S. http://www.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
>> is the posix specification for uniq if that is of any interest.
>> Curiously it says nothing about which duplicate line to keep when you
>> don't consider all fields in the comparison.
>>
>>
>>
>>
>>
>
>




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Fri, 09 Apr 2010 06:43:02 GMT) Full text and rfc822 format available.

Message #16 received at 5832 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Raphael Clifford <drraph <at> gmail.com>
Cc: Steve Ward <planet36 <at> gmail.com>, 5832 <at> debbugs.gnu.org
Subject: Re: bug#5832: Feature request: uniq -k
Date: Fri, 09 Apr 2010 08:42:39 +0200
Raphael Clifford wrote:
> Yes http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
> in particular is pretty much exactly the same feature request.
>
> What is the current thinking on this?

uniq's -k is still something we'd like.

>> uniq: missing option -W / --check-fields=N
>> http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00168.html

I glanced through most of that thread, and the guidance is still valid.
If you are interested, be sure to start the copyright
assignment paperwork:

    http://git.savannah.gnu.org/cgit/coreutils.git/tree/HACKING#n327 copyright

and to read/follow the other guidelines in HACKING.

2nd most important: to save yourself the pain of reworking big chunks
of code, and to keep review request size manageable, I suggest
you keep the mailing list in the loop on what you're doing/planning.




Information forwarded to bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Mon, 26 Dec 2011 17:43:04 GMT) Full text and rfc822 format available.

Message #19 received at 5832 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Adrien Kunysz <adrien <at> kunysz.be>, 10365 <at> debbugs.gnu.org,
	5832 <at> debbugs.gnu.org
Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or
	fields
Date: Mon, 26 Dec 2011 09:39:26 -0800
On 12/26/11 08:35, Pádraig Brady wrote:
> supporting --key would not provide this functionality.

It would support it in the most common cases, no?
That is, if every line has (say) 10 fields, then
the proposed 'uniq -F3' would be equivalent to
the proposed 'uniq -k1,7'.

I can't offhand think of good use cases for uniq -F
that would not be subsumed by uniq -k.




Information forwarded to bug-coreutils <at> gnu.org:
bug#5832; Package coreutils. (Mon, 26 Dec 2011 18:07:02 GMT) Full text and rfc822 format available.

Message #22 received at 5832 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Adrien Kunysz <adrien <at> kunysz.be>, 10365 <at> debbugs.gnu.org,
	5832 <at> debbugs.gnu.org
Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or
	fields
Date: Mon, 26 Dec 2011 18:03:40 +0000
On 12/26/2011 05:39 PM, Paul Eggert wrote:
> On 12/26/11 08:35, Pádraig Brady wrote:
>> supporting --key would not provide this functionality.
> 
> It would support it in the most common cases, no?
> That is, if every line has (say) 10 fields, then
> the proposed 'uniq -F3' would be equivalent to
> the proposed 'uniq -k1,7'.

That's what I thought at first too,
but then why didn't Adrien propose the
more normal --check-fields=7 rather than
the unusual -F3.

> I can't offhand think of good use cases for uniq -F
> that would not be subsumed by uniq -k.

Me too, Having a variable number of fields per line,
but ignoring the last constant N fields is very unusual,
and why I asked for a concrete example.

Personally I'm leaning towards suggesting `the rev| uniq -f | rev`
is fine for this edge case.

cheers,
Pádraig.




This bug report was last modified 12 years and 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.