GNU bug report logs - #29921
O(n^2) performance of rm -r

Previous Next

Package: coreutils;

Reported by: Niklas Hambüchen <niklas <at> nh2.me>

Date: Mon, 1 Jan 2018 00:06:02 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29921 in the body.
You can then email your comments to 29921 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 00:06:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Niklas Hambüchen <niklas <at> nh2.me>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 01 Jan 2018 00:06:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Niklas Hambüchen <niklas <at> nh2.me>
To: bug-coreutils <at> gnu.org
Cc: meyering <at> redhat.com
Subject: O(n^2) performance of rm -r
Date: Sun, 31 Dec 2017 22:07:31 +0100
Hello,

in commit

  rm -r: avoid O(n^2) performance for a directory with very many entries
  http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a

it says that `rm -r`

  "now displays linear performance, even when operating on million-entry
directories on ext3 and ext4"

I found this not to be the case on my systems running ext4 on Linux 4.9
on a WD 4TB spinning disk, coreutils rm 8.28.
As reported on https://bugs.python.org/issue32453#msg309303, I got:

 nfiles     real   user     sys

 100000    0.51s  0.07s   0.43s
 200000    2.46s  0.15s   0.89s
 400000   10.78s  0.26s   2.21s
 800000   44.72s  0.58s   6.03s
1600000  180.37s  1.06s  10.70s

Each 2x increase of number of files results in 4x increased deletion
time, making for a clean O(n^2) quadratic curve.

I'm testing this with:

  set -e
  rm -rf dirtest/
  echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
touch) && time rm -r dirtest/
  echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
touch) && time rm -r dirtest/
  echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
touch) && time rm -r dirtest/
  echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
touch) && time rm -r dirtest/
  echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
touch) && time rm -r dirtest/


On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
ext4, I get:

 nfiles     real   user      sys

 100000    0.94s  0.06s   0.516s
 200000    2.94s  0.10s   1.060s
 400000   10.88s  0.30s   2.508s
 800000   34.60s  0.48s   4.912s
1600000  203.87s  0.99s  11.708s

Also quadratic.

Same machine on XFS:

 nfiles     real   user     sys

 100000    3.37s  0.04s   0.98s
 200000    7.20s  0.06s   2.03s
 400000   22.52s  0.16s   5.11s
 800000   53.31s  0.37s  11.46s
1600000  200.83s  0.76s  22.41s

Quadratic.

What is the complexity of `rm -r` supposed to be?
Was it really linear in the past as the patch describes, and this is a
regression?
Can we make it work linearly again?

Thanks!
Niklas




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 16:41:02 GMT) Full text and rfc822 format available.

Message #8 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Niklas Hambüchen <niklas <at> nh2.me>, 29921 <at> debbugs.gnu.org
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 16:40:26 +0000
tag 29921 notabug
close 29921
stop

On 31/12/17 21:07, Niklas Hambüchen wrote:
> Hello,
> 
> in commit
> 
>   rm -r: avoid O(n^2) performance for a directory with very many entries
>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
> 
> it says that `rm -r`
> 
>   "now displays linear performance, even when operating on million-entry
> directories on ext3 and ext4"
> 
> I found this not to be the case on my systems running ext4 on Linux 4.9
> on a WD 4TB spinning disk, coreutils rm 8.28.
> As reported on https://bugs.python.org/issue32453#msg309303, I got:
> 
>  nfiles     real   user     sys
> 
>  100000    0.51s  0.07s   0.43s
>  200000    2.46s  0.15s   0.89s
>  400000   10.78s  0.26s   2.21s
>  800000   44.72s  0.58s   6.03s
> 1600000  180.37s  1.06s  10.70s
> 
> Each 2x increase of number of files results in 4x increased deletion
> time, making for a clean O(n^2) quadratic curve.
> 
> I'm testing this with:
> 
>   set -e
>   rm -rf dirtest/
>   echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
> touch) && time rm -r dirtest/
>   echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
> touch) && time rm -r dirtest/
>   echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
> touch) && time rm -r dirtest/
>   echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
> touch) && time rm -r dirtest/
>   echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
> touch) && time rm -r dirtest/
> 
> 
> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
> ext4, I get:
> 
>  nfiles     real   user      sys
> 
>  100000    0.94s  0.06s   0.516s
>  200000    2.94s  0.10s   1.060s
>  400000   10.88s  0.30s   2.508s
>  800000   34.60s  0.48s   4.912s
> 1600000  203.87s  0.99s  11.708s
> 
> Also quadratic.
> 
> Same machine on XFS:
> 
>  nfiles     real   user     sys
> 
>  100000    3.37s  0.04s   0.98s
>  200000    7.20s  0.06s   2.03s
>  400000   22.52s  0.16s   5.11s
>  800000   53.31s  0.37s  11.46s
> 1600000  200.83s  0.76s  22.41s
> 
> Quadratic.
> 
> What is the complexity of `rm -r` supposed to be?
> Was it really linear in the past as the patch describes, and this is a
> regression?
> Can we make it work linearly again?

Note the patch mentioned above, sorts by inode order,
which is usually a win, avoiding random access across the device.

The user and sys components of the above are largely linear,
so the increasing delay is waiting on the device.
I presume that's coming from increased latency effects
as the cached operations are sync'd more to the device
with increasing numbers of operations.
There are many Linux kernel knobs for adjusting
caching behaviour in the io scheduler.

To confirm I did a quick check on a ramdisk (/dev/shm)
to minimize the latency effects, and that showed largely
linear behaviour in the real column also.

Note if you want to blow away a whole tree quite often,
it may be more efficient to recreate the file system
on a separate (loopback) device.

cheers,
Pádraig




Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Mon, 01 Jan 2018 16:41:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 29921 <at> debbugs.gnu.org and Niklas Hambüchen <niklas <at> nh2.me> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Mon, 01 Jan 2018 16:41:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 18:12:02 GMT) Full text and rfc822 format available.

Message #15 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Niklas Hambüchen <niklas <at> nh2.me>
To: Pádraig Brady <P <at> draigBrady.com>, 29921 <at> debbugs.gnu.org
Cc: meyering <at> redhat.com
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 18:13:21 +0100
On 01/01/2018 17.40, Pádraig Brady wrote:
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.

My understanding of commit 24412edeaf556a was that "waiting for the
device" is what's claimed to be nonlinear with the patch -- it
explicitly mentions that it fixes behaviour that's only present on
seeking devices, so that would be real time, not user/sys, right?

Also, the test added at the end of the patch measures real time, not
user/sys.

So I'm relatively sure that what Jim Meyering was measuring/fixing when
he wrote the patch and commit message was real time.
But it would be nice if he could confirm that.

> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.

In the patch I linked, it says specifically

  "RAM-backed file systems (tmpfs) are not affected,
  since there is no seek penalty"

So I am not sure what checking on a ramdisk gains us here, unless you
think that this statement in the patch was incorrect.

> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.

While a nice idea, this is not very practical advice, as when an
application goes crazy and writes millions of files onto the disk, I
can't just wipe the entire file system.

A O(n^2) wall-time `rm -r` is a big problem.

Of course it may not be coreutils's fault, but since coreutils was
clearly aware of the issue in the past and claimed it was fixed, I think
we should investigate to be really sure this isn't a regression anywhere
in the ecosystem.

> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.

I don't follow this reasoning.
Should not any form of caching eventually follow linear behaviour?

Also, if it were as you said, shouldn't the `touch` also eventually
become quadratic?
It seems not so, creating the dir entries seems perfectly linear.




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 19:13:02 GMT) Full text and rfc822 format available.

Message #18 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Niklas Hambüchen <niklas <at> nh2.me>
To: Pádraig Brady <P <at> draigBrady.com>, 29921 <at> debbugs.gnu.org
Cc: meyering <at> redhat.com
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 20:12:56 +0100
Just to be clear:

What I'm reporting here is a suspected regression.

The patch I linked, merged into coreutils in 2008, claims to fix the
exact problem I'm having.

The author measured time with `date` (see patch contents) which measures
wall time. So I believe that the statement of the patch is that it
improved `real` time from O(n²) to O(n).

I observe that in the current version of rm, `real` time is O(n²).

This is why I think this is a regression: The intended improvement of
the patch is no longer in effect.




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 21:09:01 GMT) Full text and rfc822 format available.

Message #21 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 29921 <at> debbugs.gnu.org, Niklas Hambüchen <niklas <at> nh2.me>
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 13:07:49 -0800
On Mon, Jan 1, 2018 at 8:40 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> tag 29921 notabug
> close 29921
> stop
>
> On 31/12/17 21:07, Niklas Hambüchen wrote:
>> Hello,
>>
>> in commit
>>
>>   rm -r: avoid O(n^2) performance for a directory with very many entries
>>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a
>>
>> it says that `rm -r`
>>
>>   "now displays linear performance, even when operating on million-entry
>> directories on ext3 and ext4"
>>
>> I found this not to be the case on my systems running ext4 on Linux 4.9
>> on a WD 4TB spinning disk, coreutils rm 8.28.
>> As reported on https://bugs.python.org/issue32453#msg309303, I got:
>>
>>  nfiles     real   user     sys
>>
>>  100000    0.51s  0.07s   0.43s
>>  200000    2.46s  0.15s   0.89s
>>  400000   10.78s  0.26s   2.21s
>>  800000   44.72s  0.58s   6.03s
>> 1600000  180.37s  1.06s  10.70s
>>
>> Each 2x increase of number of files results in 4x increased deletion
>> time, making for a clean O(n^2) quadratic curve.
>>
>> I'm testing this with:
>>
>>   set -e
>>   rm -rf dirtest/
>>   echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
>> touch) && time rm -r dirtest/
>>   echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
>> touch) && time rm -r dirtest/
>>   echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
>> touch) && time rm -r dirtest/
>>
>>
>> On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
>> ext4, I get:
>>
>>  nfiles     real   user      sys
>>
>>  100000    0.94s  0.06s   0.516s
>>  200000    2.94s  0.10s   1.060s
>>  400000   10.88s  0.30s   2.508s
>>  800000   34.60s  0.48s   4.912s
>> 1600000  203.87s  0.99s  11.708s
>>
>> Also quadratic.
>>
>> Same machine on XFS:
>>
>>  nfiles     real   user     sys
>>
>>  100000    3.37s  0.04s   0.98s
>>  200000    7.20s  0.06s   2.03s
>>  400000   22.52s  0.16s   5.11s
>>  800000   53.31s  0.37s  11.46s
>> 1600000  200.83s  0.76s  22.41s
>>
>> Quadratic.
>>
>> What is the complexity of `rm -r` supposed to be?
>> Was it really linear in the past as the patch describes, and this is a
>> regression?
>> Can we make it work linearly again?
>
> Note the patch mentioned above, sorts by inode order,
> which is usually a win, avoiding random access across the device.
>
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.
> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.
>
> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.
>
> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.

Thanks for the report.
As Pádraig suggests, that looks like it may well represent an
opportunity to improve cache tuning. All the same, I wanted to confirm
that the heuristic of sorting by inode number is still valuable and
ran this test:

I rebuilt "rm" with lib/fts.c changed to effectively disable that
sorting (i.e., never sort, instead of "sort when there are at least
10,000 entries"):

-# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 10000
+# define FTS_INODE_SORT_DIR_ENTRIES_THRESHOLD 100000000

Then I ran this test on a Fedora 27, 4.14.6-300.fc27.x86_64 system
using an ext4-formatted spinning disk (samsung HD204UI): (btw, "time"
here is GNU time, and %e prints elapsed):

$ i=50000; while :; do i=$[i*2]; case $i in 16*) break;; esac; mkdir
x; (cd x && seq $i|xargs touch); env time -f "%e $i" /x/cu/src/rm -rf
x; done
0.49 100000
16.14 200000
170.98 400000
523.55 800000

As you can see, removing that heuristic incurs a very large penalty.

Rerunning those examples with stock rm shows the quadratic behavior you noted.

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.59 200000
12.69 400000
35.61 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.52 200000
11.53 400000
35.86 800000
$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
done
0.48 100000
2.55 200000
11.49 400000
38.86 800000

Even on an SSD (same system as above, with a 3-yr-old SSD), I see a
super-linear trend of O(N^1.5):

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 32*) break;; esac; i=$[i*2];
done
0.48 100000
1.26 200000
3.90 400000
10.56 800000
28.13 1600000
58.39 3200000

Note also that there is a test (tests/rm/ext3-perf.sh) that is
intended to cover this situation. However, back then, I made the error
of using an absolute threshold, and with subsequent general system
performance increases, it has become next to useless. We should
probably invest in updating that "expensive" test to use relative
comparisons, as done in grep's tests/long-pattern-perf
(https://git.savannah.gnu.org/cgit/grep.git/tree/tests/long-pattern-perf)
and tests/mb-non-UTF8-performance.

Yes, I was measuring elapsed wall-clock time back when I made those
changes to rm and fts. So I do agree that this is a problem, but I
doubt we can do anything in the coreutils to fix it.




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 22:51:01 GMT) Full text and rfc822 format available.

Message #24 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Pádraig Brady
 <P <at> draigbrady.com>
Cc: 29921 <at> debbugs.gnu.org, Niklas Hambüchen <niklas <at> nh2.me>
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 14:50:05 -0800
Jim Meyering wrote:

> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
> time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
> done
> 0.48 100000
> 2.59 200000
> 12.69 400000
> 35.61 800000

I'm getting similarly bad numbers with recent Fedora 27 x86-64. As this is ext4 
with reasonably-vanilla options (rw,noatime,errors=remount-ro,data=ordered), 
perhaps we should file an ext4 bug report? We could point the ext4 hackers to 
https://bugs.gnu.org/29921.

Here's what I got:

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f "%e 
$i" /usr/bin/rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done
0.95 100000
2.93 200000
17.04 400000
74.31 800000

This is more like O(N**2.5) than O(N**2). Ouch.

For what it's worth, smartctl says my file system uses a Western Digital Caviar 
Green 2 TB hard disk (WDC WD20EARS-00MVWB0).




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 23:49:02 GMT) Full text and rfc822 format available.

Message #27 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 29921 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigbrady.com>,
 Niklas Hambüchen <niklas <at> nh2.me>
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 15:48:11 -0800
On Mon, Jan 1, 2018 at 2:50 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>
>> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
>> time -f "%e $i" env rm -rf x; case $i in 8*) break;; esac; i=$[i*2];
>> done
>> 0.48 100000
>> 2.59 200000
>> 12.69 400000
>> 35.61 800000
>
>
> I'm getting similarly bad numbers with recent Fedora 27 x86-64. As this is
> ext4 with reasonably-vanilla options
> (rw,noatime,errors=remount-ro,data=ordered), perhaps we should file an ext4
> bug report? We could point the ext4 hackers to https://bugs.gnu.org/29921.
>
> Here's what I got:
>
> $ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env time -f
> "%e $i" /usr/bin/rm -rf x; case $i in 8*) break;; esac; i=$[i*2]; done
> 0.95 100000
> 2.93 200000
> 17.04 400000
> 74.31 800000
>
> This is more like O(N**2.5) than O(N**2). Ouch.
>
> For what it's worth, smartctl says my file system uses a Western Digital
> Caviar Green 2 TB hard disk (WDC WD20EARS-00MVWB0).

Here's another data point: it's nicely linear with XFS on a >1TB NVMe
card mounted with "allocsize=64k,noatime,nodiratime,rw":

$ i=100000; while :; do mkdir x; (cd x && seq $i|xargs touch); env
time -f "%e $i" env rm -rf x; case $i in 16*) break;; esac; i=$[i*2];
done
2.73 200000
5.90 400000
12.57 800000
25.06 1600000




Information forwarded to bug-coreutils <at> gnu.org:
bug#29921; Package coreutils. (Mon, 01 Jan 2018 23:55:02 GMT) Full text and rfc822 format available.

Message #30 received at 29921 <at> debbugs.gnu.org (full text, mbox):

From: Niklas Hambüchen <niklas <at> nh2.me>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Jim Meyering <jim <at> meyering.net>
Cc: 29921 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigbrady.com>
Subject: Re: bug#29921: O(n^2) performance of rm -r
Date: Tue, 2 Jan 2018 00:54:26 +0100
Hello Jim and Paul,

thank you very much for verifying my observations.

Also, thanks Jim for taking the time to make the nature of the issue
more precise:
Your patch still helps a lot with big big constant factors but doesn't
relieve us of the overall O(n²) problem.

I can also confirm the super-linear performance on SSDs you mentioned.
On my systems, I get roughly:

O(n^2  ) on spinning disks
O(n^1.7) on SSDs

I see this on ext4 on SSDs and spinning disks; also on xfs and zfs on
spinning disks but I haven't tried those two on SATA SSDs or NVMe yet.

@ Jim, if you your NVMe device available for that, could you check all
three of those filesystems on it? If it happens to be linear on NVMe on
all filesystem types, that might help narrow down the issue.

> I doubt we can do anything in the coreutils to fix it.

I guess that depends on what we will find the root cause of the issue to be:
If there is a favourable deletion order that makes the underlying
systems behave linearly, then `rm` could try to produce that order in
the same spirit as your previous patch does.

To go for a simple exmaple: If the filesystem implemented the directory
lists as a balanced binary tree, then deleting leaves evenly spread
across the width of the tree should avoid any rebalancing and result in
O(log n) for a single unlink().

> perhaps we should file an ext4 bug report?

OK, I will do that right now.

Naively, I would expect that a single unlink() can always be implemented
in amortised O(1) or at least O(log n), so that we should end up with
O(n) or O(n * log n) overall.

Another possibility (though I find it unlikely) is that deletion is in
fact O(n), but that the linear behaviour only appears once we get to
much larger n. It seems worthwhile running a deletion loop with
ever-increasing n = n*2 over night.

Thanks!




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 30 Jan 2018 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 59 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.