GNU bug report logs -
#70231
Performance issue on sort with zero-sized pseudo files
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 70231 in the body.
You can then email your comments to 70231 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#70231
; Package
coreutils
.
(Sat, 06 Apr 2024 06:39:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Takashi Kusumi <tkusumi <at> zlab.co.jp>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sat, 06 Apr 2024 06:39:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
I have found a performance issue with the sort command when used on
pseudo files with zero size. For instance, sorting `/proc/kallsyms`, as
demonstrated below, takes significantly longer than executing with
`cat`, generating numerous temporary files. I confirmed this issue on
v8.32 as well as on commit 8f3989d in the master branch.
$ time cat /proc/kallsyms | sort > /dev/null
real 0m0.954s
user 0m0.873s
sys 0m0.096s
$ time sort /proc/kallsyms > /dev/null
real 0m8.555s
user 0m3.367s
sys 0m5.064s
$ strace -e trace=openat sort /proc/kallsyms 2>&1 > /dev/null \
| grep /tmp/sort | head -100
...
openat(AT_FDCWD, "/tmp/sortM6Y6Y1", ...
openat(AT_FDCWD, "/tmp/sortPrHKMG", ...
$ strace -e trace=openat -c sort /proc/kallsyms > /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 6.419777 19 333258 8 openat
------ ----------- ----------- --------- --------- ----------------
100.00 6.419777 19 333258 8 total
It appears that the buffer size allocated for pseudo files with zero
size is insufficient, likely because it is based on their file size,
which is zero. As seen in the attached patch, I think using
`INPUT_FILE_SIZE_GUESS` to calculate the buffer size when the file size
is zero would resolve this issue.
Best regards,
Takashi Kusumi
[0001-sort-fix-performance-issue-on-zero-sized-pseudo-file.patch (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#70231
; Package
coreutils
.
(Sat, 06 Apr 2024 10:10:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 70231 <at> debbugs.gnu.org (full text, mbox):
On 06/04/2024 03:52, Takashi Kusumi wrote:
> Hi,
>
> I have found a performance issue with the sort command when used on
> pseudo files with zero size. For instance, sorting `/proc/kallsyms`, as
> demonstrated below, takes significantly longer than executing with
> `cat`, generating numerous temporary files. I confirmed this issue on
> v8.32 as well as on commit 8f3989d in the master branch.
>
> $ time cat /proc/kallsyms | sort > /dev/null
> real 0m0.954s
> user 0m0.873s
> sys 0m0.096s
>
> $ time sort /proc/kallsyms > /dev/null
> real 0m8.555s
> user 0m3.367s
> sys 0m5.064s
>
> $ strace -e trace=openat sort /proc/kallsyms 2>&1 > /dev/null \
> | grep /tmp/sort | head -100
> ...
> openat(AT_FDCWD, "/tmp/sortM6Y6Y1", ...
> openat(AT_FDCWD, "/tmp/sortPrHKMG", ...
>
> $ strace -e trace=openat -c sort /proc/kallsyms > /dev/null
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 6.419777 19 333258 8 openat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 6.419777 19 333258 8 total
>
> It appears that the buffer size allocated for pseudo files with zero
> size is insufficient, likely because it is based on their file size,
> which is zero. As seen in the attached patch, I think using
> `INPUT_FILE_SIZE_GUESS` to calculate the buffer size when the file size
> is zero would resolve this issue.
I'll apply this.
BTW we should improve sort buffer handling in general. From my TODO...
0. Have sort --debug output memory buffer sizes and space avail at $TMPDIR(s)
1. auto increase buffer when reading from pipe or zero sized files.
This will be more efficient and more importantly enable parallel operation.
See http://superuser.com/questions/938558/sort-parallel-isnt-parallelizing/
At least your more appropriate default buffer sizes in this case.
I.e. bigger mins and probably smaller maxs as half avail mem is too aggressive.
2. check() should not need full buffer size?
only merge buffer size or something small at least.
3. Look at minimizing the amount of mem used by default.
Hmm, sort auto adjusts down to avail mem in initbuf() (Test with ulimit -v)
4. Careful with too small buffers as that may initiate
an extra merge step (see section above).
If anyone wants to look at the above give me a heads up,
or I'll get to it sometime in the next release cycle.
thanks!
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#70231
; Package
coreutils
.
(Sat, 06 Apr 2024 22:23:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 70231 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2024-04-06 03:09, Pádraig Brady wrote:
> I'll apply this.
Heh, I beat you to it by looking for similar errors elsewhere and
applying the attached patches to fix the issues I found. None of them
look like serious bugs.
> BTW we should improve sort buffer handling in general
Oh yes.
PS. My current little task is to get i18n to work better with 'sort'.
Among other things I want Unicode-style full case folding.
[0001-cat-don-t-trust-st_size-on-proc-files.patch (text/x-patch, attachment)]
[0002-dd-don-t-trust-st_size-on-proc-files.patch (text/x-patch, attachment)]
[0003-sort-don-t-trust-st_size-on-proc-files.patch (text/x-patch, attachment)]
[0004-split-don-t-trust-st_size-on-proc-files.patch (text/x-patch, attachment)]
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Sun, 07 Apr 2024 12:56:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Takashi Kusumi <tkusumi <at> zlab.co.jp>
:
bug acknowledged by developer.
(Sun, 07 Apr 2024 12:56:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 70231-done <at> debbugs.gnu.org (full text, mbox):
On 06/04/2024 23:22, Paul Eggert wrote:
> On 2024-04-06 03:09, Pádraig Brady wrote:
>> I'll apply this.
>
> Heh, I beat you to it by looking for similar errors elsewhere and
> applying the attached patches to fix the issues I found. None of them
> look like serious bugs.
Cool. I thought the sort(1) change worthy of a NEWS entry so pushed one.
Marking this as done.
>> BTW we should improve sort buffer handling in general
>
> Oh yes.
>
> PS. My current little task is to get i18n to work better with 'sort'.
> Among other things I want Unicode-style full case folding.
Excellent, that will help keep the related uniq(1) and join(1)
commands more aligned in their ordering.
cheers,
Pádraig
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 06 May 2024 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.