GNU bug report logs - #36148
inconsistent behaviour with anchored regex containing back-references

Previous Next

Package: grep;

Reported by: g1pi <at> libero.it

Date: Sun, 9 Jun 2019 15:30:02 UTC

Severity: normal

Merged with 26864

To reply to this bug, email your comments to 36148 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Sun, 09 Jun 2019 15:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to g1pi <at> libero.it:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 09 Jun 2019 15:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: g1pi <at> libero.it
To: bug-grep <at> gnu.org
Subject: inconsistent behaviour with anchored regex containing back-references
Date: Sun, 9 Jun 2019 10:00:24 +0200
There seems to be a problem with beginning/end-of-line anchors in regex
containing back-references:

$ grep -V | head -1
grep (GNU grep) 3.1

$ cat words
ana
deed
ill
stats

Using -x to match whole line works:

$ egrep -x '(.?)(.?).?\2\1' words 
ana
deed
stats

Using explicit anchors emits false positives:

$ egrep   '^(.?)(.?).?\2\1$' words 
ana
deed
ill	<<<
stats

On the other hand, colouring the output shows that grep somewhat knows its
mistake:

$ egrep --color '^(.?)(.?).?\2\1$' words 
ana	(coloured)
deed	(coloured)
ill
stats	(coloured)





Merged 26864 36148. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 02 Jan 2020 09:35:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Thu, 02 Jan 2020 09:38:01 GMT) Full text and rfc822 format available.

Message #10 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: g1pi <at> libero.it
Cc: 36148 <at> debbugs.gnu.org
Subject: Re: inconsistent behaviour with anchored regex containing
 back-references
Date: Thu, 2 Jan 2020 01:37:20 -0800
Yes, back-references don't work very well. This looks to be the same bug as
Bug#26864 <https://bugs.gnu.org/26864> so I have merged the two bug reports.




Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Fri, 02 Dec 2022 01:23:02 GMT) Full text and rfc822 format available.

Message #13 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Thorsten Glaser <tg <at> mirbsd.de>
To: 36148 <at> debbugs.gnu.org
Subject: Debian Bug#930247: grep: does not handle backreferences correctly,
 violating POSIX
Date: Fri, 2 Dec 2022 01:21:01 +0000 (UTC)
Please fix this bug, it’s really bad and embarrassing.

It looks like instead of matching “the same[…]string of characters as
was matched by a subexpression[…]preceding”, it matches with the same
as the previous subexpression used?

---------- Forwarded message ----------
Message-ID: <166994354000.10956.15575266799036445295.reportbug <at> x61w.mirbsd.org>
Date: Fri, 02 Dec 2022 02:12:20 +0100
Subject: Bug#930247: grep: inconsistent behaviour with anchored regex containing
     back-references

Package: grep
Version: 3.6-1
Followup-For: Bug #930247
X-Debbugs-Cc: tg <at> mirbsd.de
Control: found 930247 3.8-3
Control: severity 930247 serious
Control: retitle 930247 grep: does not handle backreferences correctly, violating POSIX

I’m running into this, in stable and unstable both:

(sid-amd64)tglase <at> tglase:/tmp $ cat x
Total failed: 0
Total failed: 1 (1 ignored)
Total failed: 2 (1 ignored)
Total failed: 1 (2 ignored)
Total failed: 1
Total failed: 111
(sid-amd64)tglase <at> tglase:/tmp $ grep -e '^Total failed: 0$' -e '^Total failed: \([0-9]*\) (\1 ignored)$' x
Total failed: 0
Total failed: 1 (1 ignored)
Total failed: 2 (1 ignored)
Total failed: 1 (2 ignored)

By contrast, BSD handles it correctly:

tg <at> tglase-bsd:/tmp $ grep -e '^Total failed: 0$' -e '^Total failed: \([0-9]*\) (\1 ignored)$' x
Total failed: 0
Total failed: 1 (1 ignored)

POSIX:

    3. The back-reference expression '\n' shall match the same (possibly
       empty) string of characters as was matched by a subexpression
       enclosed between "\(" and "\)" preceding the '\n'. The character

via https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03
from https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html

Please fix this clear standards violation; it makes grep
virtually unusable.



-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-19-amd64 (SMP w/2 CPU threads)
Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to C.UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/lksh
Init: sysvinit (via /sbin/init)

Versions of packages grep depends on:
ii  dpkg          1.20.12
ii  install-info  6.7.0.dfsg.2-6
ii  libc6         2.31-13+deb11u5
ii  libpcre3      2:8.39-13

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3  2:8.39-13

-- no debconf information




Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Mon, 05 Dec 2022 22:26:02 GMT) Full text and rfc822 format available.

Message #16 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Thorsten Glaser <tg <at> mirbsd.de>
Cc: 930247 <at> bugs.debian.org, 36148 <at> debbugs.gnu.org
Subject: Re: bug#36148: Debian Bug#930247: grep: does not handle
 backreferences correctly, violating POSIX
Date: Mon, 5 Dec 2022 14:25:00 -0800
[Message part 1 (text/plain, inline)]
On 12/1/22 17:21, Thorsten Glaser wrote:
> Please fix this bug, it’s really bad and embarrassing.

Thanks for reporting it; I wasn't aware of it.

Although you sent your email to 36148 <at> debbugs.gnu.org / 
930247 <at> bugs.debian.9org, your email is reporting a separate bug, and I 
fixed it in the development version of GNU grep by installing the 
attached patch. This patch should appear in the next GNU grep release.

I suggest not closing the original bug reports, since the original bug 
remains. Of course fixes are welcome but they are lower priority.
[0001-grep-bug-backref-in-last-of-multiple-patterns.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Mon, 05 Dec 2022 23:46:02 GMT) Full text and rfc822 format available.

Message #19 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Thorsten Glaser <tg <at> mirbsd.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 930247 <at> bugs.debian.org, 36148 <at> debbugs.gnu.org
Subject: Re: bug#36148: Debian Bug#930247: grep: does not handle backreferences
 correctly, violating POSIX
Date: Mon, 5 Dec 2022 23:36:23 +0000 (UTC)
Paul Eggert dixit:

> Although you sent your email to 36148 <at> debbugs.gnu.org /
> 930247 <at> bugs.debian.9org, your email is reporting a separate bug

Oh OK, I wasn’t aware, it sounded similar enough.

> I fixed it in the development version of GNU grep by installing the
> attached patch. This patch should appear in the next GNU grep release.

Thank you!

bye,
//mirabilos
-- 
  "Using Lynx is like wearing a really good pair of shades: cuts out
   the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL."
                                         -- Henry Nelson, March 1999




Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Fri, 20 Jan 2023 09:53:01 GMT) Full text and rfc822 format available.

Message #22 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Santiago Ruano Rincón <santiagorr <at> riseup.net>
To: Thorsten Glaser <tg <at> mirbsd.de>, 930247 <at> bugs.debian.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 36148 <at> debbugs.gnu.org
Subject: Re: Bug#930247: bug#36148: Debian Bug#930247: grep: does not handle
 backreferences correctly, violating POSIX
Date: Fri, 20 Jan 2023 10:51:55 +0100
[Message part 1 (text/plain, inline)]
El 05/12/22 a las 23:36, Thorsten Glaser escribió:
> Paul Eggert dixit:
> 
> > Although you sent your email to 36148 <at> debbugs.gnu.org /
> > 930247 <at> bugs.debian.9org, your email is reporting a separate bug
> 
> Oh OK, I wasn’t aware, it sounded similar enough.

I'll clone the bug in Debian (and adjust severities), to make it easier
to follow/differentiate both bugs.

Paul, do you want me to do the same in debbugs.gnu.org?

> 
> > I fixed it in the development version of GNU grep by installing the
> > attached patch. This patch should appear in the next GNU grep release.

grep is now freezed in Debian bookworm, and I'll have to contact
release-team about fixing this (in bullseye too).

Cheers,

 -- Santiago
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#36148; Package grep. (Fri, 20 Jan 2023 14:39:02 GMT) Full text and rfc822 format available.

Message #25 received at 36148 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Santiago Ruano Rincón <santiagorr <at> riseup.net>,
 Thorsten Glaser <tg <at> mirbsd.de>, 930247 <at> bugs.debian.org
Cc: 36148 <at> debbugs.gnu.org
Subject: Re: bug#36148: Debian Bug#930247: grep: does not handle
 backreferences correctly, violating POSIX
Date: Fri, 20 Jan 2023 06:37:57 -0800
On 2023-01-20 01:51, Santiago Ruano Rincón wrote:
> I'll clone the bug in Debian (and adjust severities), to make it easier
> to follow/differentiate both bugs.
> 
> Paul, do you want me to do the same in debbugs.gnu.org?

Please don't bother, since the bug is already fixed upstream.




This bug report was last modified 1 year and 105 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.