GNU bug report logs - #41558
Regexp Bug

Previous Next

Package: sed;

Reported by: "anton.paras" <anton <at> paras.nu>

Date: Wed, 27 May 2020 04:16:02 UTC

Severity: normal

To reply to this bug, email your comments to 41558 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#41558; Package sed. (Wed, 27 May 2020 04:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "anton.paras" <anton <at> paras.nu>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Wed, 27 May 2020 04:16:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "anton.paras" <anton <at> paras.nu>
To: "bug-sed" <bug-sed <at> gnu.org>
Subject: Regexp Bug
Date: Tue, 26 May 2020 21:14:12 -0700
[Message part 1 (text/plain, inline)]
I posted to Stack Exchange, and they recommended that I file a bug. I'd rather not copy+paste it all, so here's the link:



https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and



here's an example



> echo 'dog and foo and bar and baz land good' |    sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'



expected output: dog XYZ foo and bar and baz land good

actual output: dog and foo XYZ bar and baz land good


here's my sed --version output: sed (GNU sed) 4.2.2



I hope this is helpful, cheers!
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#41558; Package sed. (Thu, 28 May 2020 06:30:02 GMT) Full text and rfc822 format available.

Message #8 received at 41558 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <41558 <at> debbugs.gnu.org>
Cc: "anton.paras" <anton <at> paras.nu>, bug-gnulib <at> gnu.org
Subject: Re: bug#41558: Regexp Bug
Date: Thu, 28 May 2020 15:29:39 +0900
On Tue, 26 May 2020 21:14:12 -0700
"anton.paras" <anton <at> paras.nu> wrote:

> I posted to Stack Exchange, and they recommended that I file a bug. I'd rather not copy+paste it all, so here's the link:
> 
> 
> 
> https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> 
> 
> 
> here's an example
> 
> 
> 
> > echo 'dog and foo and bar and baz land good' |??? sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> 
> 
> 
> expected output:?dog XYZ foo and bar and baz land good
> 
> actual output:?dog and foo XYZ bar and baz land good
> 
> 
> here's my sed --version output:?sed (GNU sed) 4.2.2
> 
> 
> 
> I hope this is helpful, cheers!

$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
foo and bar land
$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
$

It seems that there is the bug in regex.

expected:
$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
$

It also reproduces in grep.

$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
foo and bar land
$ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
$






Information forwarded to bug-sed <at> gnu.org:
bug#41558; Package sed. (Tue, 22 Sep 2020 23:41:02 GMT) Full text and rfc822 format available.

Message #11 received at 41558 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: "anton.paras" <anton <at> paras.nu>,
 "bug-gnulib <at> gnu.org List" <bug-gnulib <at> gnu.org>, 41558 <at> debbugs.gnu.org
Subject: Re: bug#41558: Regexp Bug
Date: Tue, 22 Sep 2020 16:40:02 -0700
On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Tue, 26 May 2020 21:14:12 -0700
> "anton.paras" <anton <at> paras.nu> wrote:
>
> > I posted to Stack Exchange, and they recommended that I file a bug. I'd rather not copy+paste it all, so here's the link:
> >
> >
> >
> > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> >
> >
> >
> > here's an example
> >
> >
> >
> > > echo 'dog and foo and bar and baz land good' |??? sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> >
> >
> >
> > expected output:?dog XYZ foo and bar and baz land good
> >
> > actual output:?dog and foo XYZ bar and baz land good
> >
> >
> > here's my sed --version output:?sed (GNU sed) 4.2.2
> >
> >
> >
> > I hope this is helpful, cheers!
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It seems that there is the bug in regex.
>
> expected:
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It also reproduces in grep.
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
> $

I agree that this looks like a regex bug. This should print nothing:
  echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
just as this already does:
  echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p'

Does anyone know if there's a glibc bug number for it?




Information forwarded to bug-sed <at> gnu.org:
bug#41558; Package sed. (Wed, 23 Sep 2020 00:41:01 GMT) Full text and rfc822 format available.

Message #14 received at 41558 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: "anton.paras" <anton <at> paras.nu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 41558 <at> debbugs.gnu.org
Subject: Re: bug#41558: Regexp Bug
Date: Tue, 22 Sep 2020 17:40:50 -0700
On 9/22/20 4:40 PM, Jim Meyering wrote:
> Does anyone know if there's a glibc bug number for it?

I looked for one and didn't find it, so I created glibc bug 26653 for it. See:

https://sourceware.org/bugzilla/show_bug.cgi?id=26653




Information forwarded to bug-sed <at> gnu.org:
bug#41558; Package sed. (Wed, 23 Sep 2020 02:05:02 GMT) Full text and rfc822 format available.

Message #17 received at 41558 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "anton.paras" <anton <at> paras.nu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 41558 <at> debbugs.gnu.org
Subject: Re: bug#41558: Regexp Bug
Date: Tue, 22 Sep 2020 19:04:17 -0700
On Tue, Sep 22, 2020 at 5:40 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 9/22/20 4:40 PM, Jim Meyering wrote:
> > Does anyone know if there's a glibc bug number for it?
>
> I looked for one and didn't find it, so I created glibc bug 26653 for it. See:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26653

Nice! Thank you!




This bug report was last modified 3 years and 220 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.