GNU bug report logs -
#29921
O(n^2) performance of rm -r
Previous Next
Reported by: Niklas Hambüchen <niklas <at> nh2.me>
Date: Mon, 1 Jan 2018 00:06:02 UTC
Severity: normal
Tags: notabug
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29921 in the body.
You can then email your comments to 29921 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 00:06:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Niklas Hambüchen <niklas <at> nh2.me>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 01 Jan 2018 00:06:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
in commit
rm -r: avoid O(n^2) performance for a directory with very many entries
http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
it says that `rm -r`
"now displays linear performance, even when operating on million-entry
directories on ext3 and ext4"
I found this not to be the case on my systems running ext4 on Linux 4.9
on a WD 4TB spinning disk, coreutils rm 8.28.
As reported on https://bugs.python.org/issue32453#msg309303, I got:
nfiles real user sys
100000 0.51s 0.07s 0.43s
200000 2.46s 0.15s 0.89s
400000 10.78s 0.26s 2.21s
800000 44.72s 0.58s 6.03s
1600000 180.37s 1.06s 10.70s
Each 2x increase of number of files results in 4x increased deletion
time, making for a clean O(n^2) quadratic curve.
I'm testing this with:
set -e
rm -rf dirtest/
echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs
touch) && time rm -r dirtest/
echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs
touch) && time rm -r dirtest/
echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs
touch) && time rm -r dirtest/
echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs
touch) && time rm -r dirtest/
echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
touch) && time rm -r dirtest/
On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
ext4, I get:
nfiles real user sys
100000 0.94s 0.06s 0.516s
200000 2.94s 0.10s 1.060s
400000 10.88s 0.30s 2.508s
800000 34.60s 0.48s 4.912s
1600000 203.87s 0.99s 11.708s
Also quadratic.
Same machine on XFS:
nfiles real user sys
100000 3.37s 0.04s 0.98s
200000 7.20s 0.06s 2.03s
400000 22.52s 0.16s 5.11s
800000 53.31s 0.37s 11.46s
1600000 200.83s 0.76s 22.41s
Quadratic.
What is the complexity of `rm -r` supposed to be?
Was it really linear in the past as the patch describes, and this is a
regression?
Can we make it work linearly again?
Thanks!
Niklas
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 16:41:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 29921 <at> debbugs.gnu.org (full text, mbox):
tag 29921 notabug
close 29921
stop
On 31/12/17 21:07, Niklas Hambüchen wrote:
> Hello,
>
> in commit
>
> rm -r: avoid O(n^2) performance for a directory with very many entries
> http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
>
> it says that `rm -r`
>
> "now displays linear performance, even when operating on million-entry
> directories on ext3 and ext4"
>
> I found this not to be the case on my systems running ext4 on Linux 4.9
> on a WD 4TB spinning disk, coreutils rm 8.28.
> As reported on https://bugs.python.org/issue32453#msg309303, I got:
>
> nfiles real user sys
>
> 100000 0.51s 0.07s 0.43s
> 200000 2.46s 0.15s 0.89s
> 400000 10.78s 0.26s 2.21s
> 800000 44.72s 0.58s 6.03s
> 1600000 180.37s 1.06s 10.70s
>
> Each 2x increase of number of files results in 4x increased deletion
> time, making for a clean O(n^2) quadratic curve.
>
> I'm testing this with:
>
> set -e
> rm -rf dirtest/
> echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs
> touch) && time rm -r dirtest/
> echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs
> touch) && time rm -r dirtest/
> echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs
> touch) && time rm -r dirtest/
> echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs
> touch) && time rm -r dirtest/
> echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
> touch) && time rm -r dirtest/
>
>
> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
> ext4, I get:
>
> nfiles real user sys
>
> 100000 0.94s 0.06s 0.516s
> 200000 2.94s 0.10s 1.060s
> 400000 10.88s 0.30s 2.508s
> 800000 34.60s 0.48s 4.912s
> 1600000 203.87s 0.99s 11.708s
>
> Also quadratic.
>
> Same machine on XFS:
>
> nfiles real user sys
>
> 100000 3.37s 0.04s 0.98s
> 200000 7.20s 0.06s 2.03s
> 400000 22.52s 0.16s 5.11s
> 800000 53.31s 0.37s 11.46s
> 1600000 200.83s 0.76s 22.41s
>
> Quadratic.
>
> What is the complexity of `rm -r` supposed to be?
> Was it really linear in the past as the patch describes, and this is a
> regression?
> Can we make it work linearly again?
Note the patch mentioned above, sorts by inode order,
which is usually a win, avoiding random access across the device.
The user and sys components of the above are largely linear,
so the increasing delay is waiting on the device.
I presume that's coming from increased latency effects
as the cached operations are sync'd more to the device
with increasing numbers of operations.
There are many Linux kernel knobs for adjusting
caching behaviour in the io scheduler.
To confirm I did a quick check on a ramdisk (/dev/shm)
to minimize the latency effects, and that showed largely
linear behaviour in the real column also.
Note if you want to blow away a whole tree quite often,
it may be more efficient to recreate the file system
on a separate (loopback) device.
cheers,
Pádraig
Added tag(s) notabug.
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Mon, 01 Jan 2018 16:41:03 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
29921 <at> debbugs.gnu.org and Niklas Hambüchen <niklas <at> nh2.me>
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Mon, 01 Jan 2018 16:41:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 18:12:02 GMT)
Full text and
rfc822 format available.
Message #15 received at 29921 <at> debbugs.gnu.org (full text, mbox):
On 01/01/2018 17.40, Pádraig Brady wrote:
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.
My understanding of commit 24412edeaf556a was that "waiting for the
device" is what's claimed to be nonlinear with the patch -- it
explicitly mentions that it fixes behaviour that's only present on
seeking devices, so that would be real time, not user/sys, right?
Also, the test added at the end of the patch measures real time, not
user/sys.
So I'm relatively sure that what Jim Meyering was measuring/fixing when
he wrote the patch and commit message was real time.
But it would be nice if he could confirm that.
> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.
In the patch I linked, it says specifically
"RAM-backed file systems (tmpfs) are not affected,
since there is no seek penalty"
So I am not sure what checking on a ramdisk gains us here, unless you
think that this statement in the patch was incorrect.
> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.
While a nice idea, this is not very practical advice, as when an
application goes crazy and writes millions of files onto the disk, I
can't just wipe the entire file system.
A O(n^2) wall-time `rm -r` is a big problem.
Of course it may not be coreutils's fault, but since coreutils was
clearly aware of the issue in the past and claimed it was fixed, I think
we should investigate to be really sure this isn't a regression anywhere
in the ecosystem.
> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.
I don't follow this reasoning.
Should not any form of caching eventually follow linear behaviour?
Also, if it were as you said, shouldn't the `touch` also eventually
become quadratic?
It seems not so, creating the dir entries seems perfectly linear.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 19:13:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 29921 <at> debbugs.gnu.org (full text, mbox):
Just to be clear:
What I'm reporting here is a suspected regression.
The patch I linked, merged into coreutils in 2008, claims to fix the
exact problem I'm having.
The author measured time with `date` (see patch contents) which measures
wall time. So I believe that the statement of the patch is that it
improved `real` time from O(n²) to O(n).
I observe that in the current version of rm, `real` time is O(n²).
This is why I think this is a regression: The intended improvement of
the patch is no longer in effect.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 21:09:01 GMT)
Full text and
rfc822 format available.
Message #21 received at 29921 <at> debbugs.gnu.org (full text, mbox):
On Mon, Jan 1, 2018 at 8:40 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> tag 29921 notabug
> close 29921
> stop
>
> On 31/12/17 21:07, Niklas Hambüchen wrote:
>> Hello,
>>
>> in commit
>>
>> rm -r: avoid O(n^2) performance for a directory with very many entries
>> http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
>>
>> it says that `rm -r`
>>
>> "now displays linear performance, even when operating on million-entry
>> directories on ext3 and ext4"
>>
>> I found this not to be the case on my systems running ext4 on Linux 4.9
>> on a WD 4TB spinning disk, coreutils rm 8.28.
>> As reported on https://bugs.python.org/issue32453#msg309303, I got:
>>
>> nfiles real user sys
>>
>> 100000 0.51s 0.07s 0.43s
>> 200000 2.46s 0.15s 0.89s
>> 400000 10.78s 0.26s 2.21s
>> 800000 44.72s 0.58s 6.03s
>> 1600000 180.37s 1.06s 10.70s
>>
>> Each 2x increase of number of files results in 4x increased deletion
>> time, making for a clean O(n^2) quadratic curve.
>>
>> I'm testing this with:
>>
>> set -e
>> rm -rf dirtest/
>> echo 100000 && (mkdir dirtest && cd dirtest && seq 1 100000 | xargs
>> touch) && time rm -r dirtest/
>> echo 200000 && (mkdir dirtest && cd dirtest && seq 1 200000 | xargs
>> touch) && time rm -r dirtest/
>> echo 400000 && (mkdir dirtest && cd dirtest && seq 1 400000 | xargs
>> touch) && time rm -r dirtest/
>> echo 800000 && (mkdir dirtest && cd dirtest && seq 1 800000 | xargs
>> touch) && time rm -r dirtest/
>> echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
>> touch) && time rm -r dirtest/
>>
>>
>> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
>> ext4, I get:
>>
>> nfiles real user sys
>>
>> 100000 0.94s 0.06s 0.516s
>> 200000 2.94s 0.10s 1.060s
>> 400000 10.88s 0.30s 2.508s
>> 800000 34.60s 0.48s 4.912s
>> 1600000 203.87s 0.99s 11.708s
>>
>> Also quadratic.
>>
>> Same machine on XFS:
>>
>> nfiles real user sys
>>
>> 100000 3.37s 0.04s 0.98s
>> 200000 7.20s 0.06s 2.03s
>> 400000 22.52s 0.16s 5.11s
>> 800000 53.31s 0.37s 11.46s
>> 1600000 200.83s 0.76s 22.41s
>>
>> Quadratic.
>>
>> What is the complexity of `rm -r` supposed to be?
>> Was it really linear in the past as the patch describes, and this is a
>> regression?
>> Can we make it work linearly again?
>
> Note the patch mentioned above, sorts by inode order,
> which is usually a win, avoiding random access across the device.
>
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.
> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.
>
> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.
>
> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.
Thanks for the report.
As Pádraig suggests, that looks like it may well represent an
opportunity to improve cache tuning. All the same, I wanted to confirm
that the heuristic of sorting by inode number is still valuable and
ran this test:
I rebuilt "rm" with lib/fts.c changed to effectively disable that
sorting (i.e., never sort, instead of "sort when there are at least
10,000 entries"):
-# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 10000
+# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 100000000
Then I ran this test on a Fedora 27, 4.14.6-300.fc27.x86_64 system
using an ext4-formatted spinning disk (samsung HD204UI): (btw, "time"
here is GNU time, and %e prints elapsed):
$ i=50000; while :; do i=$[i*2]; case $i in 16*) break;; esac; mkdir
x; (cd x && seq $i|xargs touch); env time -f "%e $i" /x/cu/src/rm -rf
x; done
0.49 100000
16.14 200000
170.98 400000
523.55 800000
As you can see, removing that heuristic incurs a very large penalty.
Rerunning those examples with stock rm shows the quadratic behavior you noted.
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.59 200000
12.69 400000
35.61 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.52 200000
11.53 400000
35.86 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.55 200000
11.49 400000
38.86 800000
Even on an SSD (same system as above, with a 3-yr-old SSD), I see a
super-linear trend of O(N^1.5):
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 32*) break;; esac; i=$[i*2];
done
0.48 100000
1.26 200000
3.90 400000
10.56 800000
28.13 1600000
58.39 3200000
Note also that there is a test (tests/rm/ext3-perf.sh) that is
intended to cover this situation. However, back then, I made the error
of using an absolute threshold, and with subsequent general system
performance increases, it has become next to useless. We should
probably invest in updating that "expensive" test to use relative
comparisons, as done in grep's tests/long-pattern-perf
(https://git.savannah.gnu.org/cgit/grep.git/tree/tests/long-pattern-perf)
and tests/mb-non-UTF8-performance.
Yes, I was measuring elapsed wall-clock time back when I made those
changes to rm and fts. So I do agree that this is a problem, but I
doubt we can do anything in the coreutils to fix it.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 22:51:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 29921 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
> time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
> done
> 0.48 100000
> 2.59 200000
> 12.69 400000
> 35.61 800000
I'm getting similarly bad numbers with recent Fedora 27 x86-64. As this is ext4
with reasonably-vanilla options (rw,noatime,errors=remount-ro,data=ordered),
perhaps we should file an ext4 bug report? We could point the ext4 hackers to
https://bugs.gnu.org/29921.
Here's what I got:
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e
$i" /usr/bin/rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done
0.95 100000
2.93 200000
17.04 400000
74.31 800000
This is more like O(N**2.5) than O(N**2). Ouch.
For what it's worth, smartctl says my file system uses a Western Digital Caviar
Green 2 TB hard disk (WDC WD20EARS-00MVWB0).
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 23:49:02 GMT)
Full text and
rfc822 format available.
Message #27 received at 29921 <at> debbugs.gnu.org (full text, mbox):
On Mon, Jan 1, 2018 at 2:50 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>
>> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
>> time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
>> done
>> 0.48 100000
>> 2.59 200000
>> 12.69 400000
>> 35.61 800000
>
>
> I'm getting similarly bad numbers with recent Fedora 27 x86-64. As this is
> ext4 with reasonably-vanilla options
> (rw,noatime,errors=remount-ro,data=ordered), perhaps we should file an ext4
> bug report? We could point the ext4 hackers to https://bugs.gnu.org/29921.
>
> Here's what I got:
>
> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f
> "%e $i" /usr/bin/rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done
> 0.95 100000
> 2.93 200000
> 17.04 400000
> 74.31 800000
>
> This is more like O(N**2.5) than O(N**2). Ouch.
>
> For what it's worth, smartctl says my file system uses a Western Digital
> Caviar Green 2 TB hard disk (WDC WD20EARS-00MVWB0).
Here's another data point: it's nicely linear with XFS on a >1TB NVMe
card mounted with "allocsize=64k,noatime,nodiratime,rw":
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 16*) break;; esac; i=$[i*2];
done
2.73 200000
5.90 400000
12.57 800000
25.06 1600000
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29921
; Package
coreutils
.
(Mon, 01 Jan 2018 23:55:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 29921 <at> debbugs.gnu.org (full text, mbox):
Hello Jim and Paul,
thank you very much for verifying my observations.
Also, thanks Jim for taking the time to make the nature of the issue
more precise:
Your patch still helps a lot with big big constant factors but doesn't
relieve us of the overall O(n²) problem.
I can also confirm the super-linear performance on SSDs you mentioned.
On my systems, I get roughly:
O(n^2 ) on spinning disks
O(n^1.7) on SSDs
I see this on ext4 on SSDs and spinning disks; also on xfs and zfs on
spinning disks but I haven't tried those two on SATA SSDs or NVMe yet.
@ Jim, if you your NVMe device available for that, could you check all
three of those filesystems on it? If it happens to be linear on NVMe on
all filesystem types, that might help narrow down the issue.
> I doubt we can do anything in the coreutils to fix it.
I guess that depends on what we will find the root cause of the issue to be:
If there is a favourable deletion order that makes the underlying
systems behave linearly, then `rm` could try to produce that order in
the same spirit as your previous patch does.
To go for a simple exmaple: If the filesystem implemented the directory
lists as a balanced binary tree, then deleting leaves evenly spread
across the width of the tree should avoid any rebalancing and result in
O(log n) for a single unlink().
> perhaps we should file an ext4 bug report?
OK, I will do that right now.
Naively, I would expect that a single unlink() can always be implemented
in amortised O(1) or at least O(log n), so that we should end up with
O(n) or O(n * log n) overall.
Another possibility (though I find it unlikely) is that deletion is in
fact O(n), but that the linear behaviour only appears once we get to
much larger n. It seems worthwhile running a deletion loop with
ever-increasing n = n*2 over night.
Thanks!
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 30 Jan 2018 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 59 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.