GNU bug report logs - #30520
sed: silently fail substitution for 2GB+ single-line file

Previous Next

Package: sed;

Reported by: YushiOMOTE <yushiomote <at> gmail.com>

Date: Mon, 19 Feb 2018 07:26:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30520 in the body.
You can then email your comments to 30520 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Mon, 19 Feb 2018 07:26:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to YushiOMOTE <yushiomote <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Mon, 19 Feb 2018 07:26:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: YushiOMOTE <yushiomote <at> gmail.com>
To: "bug-sed <at> gnu.org" <bug-sed <at> gnu.org>
Subject: sed: silently fail substitution for 2GB+ single-line file
Date: Mon, 19 Feb 2018 06:51:04 +0000
[Message part 1 (text/plain, inline)]
The following behavior is observed with sed version 4.2.2 (x86_64 / CentOS
7.2).

(It's also observed with the current head of the git repo:
41e506e12b363628af29b089f55585b17e1389e6)

When I pass a 2GB+-single-line to sed for substitution, sed exits with
success status code but it silently fails substitution.

```
$ sed `s/aaa/bbb/g` 20GB+-single-line-file.txt > out.txt && echo OK
OK
(...but nothing gets substituted even though the target string exists)
```

In summary, the cause seems that sed isn't checking error return values
from `re_search <at> glibc`.

What is happening in detail seems as follows.

At the beginning of `do_subst` function (sed/execute.c),

```
  /* The first part of the loop optimizes s/xxx// when xxx is at the
     start, and s/xxx$// */
  if (!match_regex(sub->regx, line.active, line.length, start,
                   &regs, sub->max_id + 1))
    return;
```

`line.length` (unsigned long), is set to the value larger than INT_MAX. And
when `match_regex` (sed/regexp.c) calls `re_search`,

```
    ret = re_search (&regex->pattern, buf, buflen, buf_start_offset,
                     buflen - buf_start_offset,
                     regsize ? regarray : NULL);
```

`buflen` here is equal to `line.length` (the value larger than INT_MAX) but
the corresponding argument of `re_search` is `int`. The value is converted
to a negative number. Then, `re_search` returns -1 due to its error
checking. But the return value is not treated by sed and no errors are
raised.

A possible fix could be adding an error checking for `re_search` (like this
patch).
Or, we can check if the value of line length exceeds INT_MAX before
reaching `re_search`. (`grep` checks it and returns with error status code)

Best regards,

Yushi
[Message part 2 (text/html, inline)]
[0001-sed-check-errors-of-re_search.patch (application/x-patch, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Mon, 19 Feb 2018 20:33:02 GMT) Full text and rfc822 format available.

Message #8 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: YushiOMOTE <yushiomote <at> gmail.com>
Cc: 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Mon, 19 Feb 2018 13:32:02 -0700
Hello,

On Mon, Feb 19, 2018 at 06:51:04AM +0000, YushiOMOTE wrote:
> When I pass a 2GB+-single-line to sed for substitution, sed exits with
> success status code but it silently fails substitution.
> 
> ```
> $ sed `s/aaa/bbb/g` 20GB+-single-line-file.txt > out.txt && echo OK
> OK
> (...but nothing gets substituted even though the target string exists)

Thank you for reporting this issue, and providing such detailed
analysis. I am able to reproduce this issue.

Few minor comments:

> In summary, the cause seems that sed isn't checking error return values
> from `re_search <at> glibc`.

Technically, it's from gnulib, not glibc.
The code is in https://opengrok.housegordon.com/source/xref/gnulib/lib/regexec.c#369
the function 're_search_stub'.

> What is happening in detail seems as follows.
> 
> At the beginning of `do_subst` function (sed/execute.c),
> [...] 
> `line.length` (unsigned long), is set to the value larger than INT_MAX. And
> when `match_regex` (sed/regexp.c) calls `re_search`,
> 
> ```
>     ret = re_search (&regex->pattern, buf, buflen, buf_start_offset,
>                      buflen - buf_start_offset,
>                      regsize ? regarray : NULL);
> ```
> 
> `buflen` here is equal to `line.length` (the value larger than INT_MAX) but
> the corresponding argument of `re_search` is `int`. The value is converted
> to a negative number. Then, `re_search` returns -1 due to its error
> checking. But the return value is not treated by sed and no errors are
> raised.

From a cursory look, it seems "re_search_stub" does not have error
checking for this specific case (length<0), and it continues with its
string search, but finds nothing - thus returning -1.
It returns "-1" both when the pattern is not found, and when there
are such value errors.

Later code in 'regexp.c:match_regex()' does check the returned value
like so:
   return (ret > -1);

Therefore, "match_regex" returns FALSE in this case,
and "do_subst" simply concludes that the string was not found.
Sed then terminates without performing any replacement,
and without reporting any errors.


> A possible fix could be adding an error checking for `re_search` (like this
> patch).

I'm not sure this would work.
The code already checks for -1, which is used to indicate "pattern not found".

The gnulib implementation of "re_search" is described here:
https://www.gnu.org/software/gnulib/manual/html_node/GNU-Searching.html
It says:
   "If no match is found, re_search returns -1.
    If a match is found, it returns the index where the match began.
    If an internal error happens, it returns -2."

However it seems that "internal errors" only relates to failed memory allocations.
Out of range values are treated by "re_search" as simple "pattern not found",
as evident by this check:

   /* Check for out-of-range.  */
   if (BE (start < 0 || start > length, 0))
     return -1;

In https://opengrok.housegordon.com/source/xref/gnulib/lib/regexec.c#381


> Or, we can check if the value of line length exceeds INT_MAX before
> reaching `re_search`. (`grep` checks it and returns with error status code)

This is likely the simplest solution for now, without changing gnulib itself.


regards,
 - assaf




Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Fri, 23 Mar 2018 00:13:01 GMT) Full text and rfc822 format available.

Message #11 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Thu, 22 Mar 2018 17:12:11 -0700
On Mon, Feb 19, 2018 at 12:32 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> On Mon, Feb 19, 2018 at 06:51:04AM +0000, YushiOMOTE wrote:
>> When I pass a 2GB+-single-line to sed for substitution, sed exits with
>> success status code but it silently fails substitution.
...
>> Or, we can check if the value of line length exceeds INT_MAX before
>> reaching `re_search`. (`grep` checks it and returns with error status code)
>
> This is likely the simplest solution for now, without changing gnulib itself.

Thank you both. I too prefer this approach. Would you like to do that, Assaf?




Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Fri, 23 Mar 2018 01:43:01 GMT) Full text and rfc822 format available.

Message #14 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Thu, 22 Mar 2018 19:41:38 -0600
[Message part 1 (text/plain, inline)]
On Thu, Mar 22, 2018 at 05:12:11PM -0700, Jim Meyering wrote:
> On Mon, Feb 19, 2018 at 12:32 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> > On Mon, Feb 19, 2018 at 06:51:04AM +0000, YushiOMOTE wrote:
> >> When I pass a 2GB+-single-line to sed for substitution, sed exits with
> >> success status code but it silently fails substitution.
> ...
> >> Or, we can check if the value of line length exceeds INT_MAX before
> >> reaching `re_search`. (`grep` checks it and returns with error status code)
> >
> > This is likely the simplest solution for now, without changing gnulib itself.
> 
> Thank you both. I too prefer this approach. Would you like to do that, Assaf?

Yes, attached a suggested patch.

Comments welcomed,
 - assaf
[0001-sed-reject-RE-searches-on-buffers-larger-than-INT_MA.patch (text/x-diff, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Fri, 23 Mar 2018 05:18:02 GMT) Full text and rfc822 format available.

Message #17 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Thu, 22 Mar 2018 22:17:15 -0700
[Message part 1 (text/plain, inline)]
On Thu, Mar 22, 2018 at 6:41 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> On Thu, Mar 22, 2018 at 05:12:11PM -0700, Jim Meyering wrote:
>> On Mon, Feb 19, 2018 at 12:32 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>> > On Mon, Feb 19, 2018 at 06:51:04AM +0000, YushiOMOTE wrote:
>> >> When I pass a 2GB+-single-line to sed for substitution, sed exits with
>> >> success status code but it silently fails substitution.
>> ...
>> >> Or, we can check if the value of line length exceeds INT_MAX before
>> >> reaching `re_search`. (`grep` checks it and returns with error status code)
>> >
>> > This is likely the simplest solution for now, without changing gnulib itself.
>>
>> Thank you both. I too prefer this approach. Would you like to do that, Assaf?
>
> Yes, attached a suggested patch.

Great! Thank you for the quick work!
Here are some small suggested adjustments:
 - add targets so the advice on running the very expensive tests works
 - minor wording changes
 - remove apparently unnecessary "< /dev/null" from new test
[sed-2G-input-nits.diff (application/octet-stream, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Fri, 23 Mar 2018 05:49:02 GMT) Full text and rfc822 format available.

Message #20 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Thu, 22 Mar 2018 22:48:20 -0700
On Thu, Mar 22, 2018 at 10:17 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, Mar 22, 2018 at 6:41 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>> On Thu, Mar 22, 2018 at 05:12:11PM -0700, Jim Meyering wrote:
>>> On Mon, Feb 19, 2018 at 12:32 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>>> > On Mon, Feb 19, 2018 at 06:51:04AM +0000, YushiOMOTE wrote:
>>> >> When I pass a 2GB+-single-line to sed for substitution, sed exits with
>>> >> success status code but it silently fails substitution.
>>> ...
>>> >> Or, we can check if the value of line length exceeds INT_MAX before
>>> >> reaching `re_search`. (`grep` checks it and returns with error status code)
>>> >
>>> > This is likely the simplest solution for now, without changing gnulib itself.
>>>
>>> Thank you both. I too prefer this approach. Would you like to do that, Assaf?
>>
>> Yes, attached a suggested patch.
>
> Great! Thank you for the quick work!
> Here are some small suggested adjustments:
>  - add targets so the advice on running the very expensive tests works
>  - minor wording changes
>  - remove apparently unnecessary "< /dev/null" from new test

I'll defer the first snapshot until after you push this fix.
Thanks again.




Information forwarded to bug-sed <at> gnu.org:
bug#30520; Package sed. (Fri, 23 Mar 2018 15:05:02 GMT) Full text and rfc822 format available.

Message #23 received at 30520 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520 <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Fri, 23 Mar 2018 09:03:58 -0600
On Thu, Mar 22, 2018 at 10:48:20PM -0700, Jim Meyering wrote:
> I'll defer the first snapshot until after you push this fix.

Thanks for the fixes, pushed with your edits here:
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Mon, 26 Mar 2018 15:21:02 GMT) Full text and rfc822 format available.

Notification sent to YushiOMOTE <yushiomote <at> gmail.com>:
bug acknowledged by developer. (Mon, 26 Mar 2018 15:21:02 GMT) Full text and rfc822 format available.

Message #28 received at 30520-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: YushiOMOTE <yushiomote <at> gmail.com>, 30520-done <at> debbugs.gnu.org
Subject: Re: bug#30520: sed: silently fail substitution for 2GB+ single-line
 file
Date: Mon, 26 Mar 2018 08:19:33 -0700
marking as "done"




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 24 Apr 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.