GNU bug report logs - #16481
dfa.c and Rational Range Interpretation

Previous Next

Package: grep;

Reported by: Aharon Robbins <arnold <at> skeeve.com>

Date: Fri, 17 Jan 2014 13:41:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16481 in the body.
You can then email your comments to 16481 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Fri, 17 Jan 2014 13:41:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Aharon Robbins <arnold <at> skeeve.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 17 Jan 2014 13:41:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: bug-grep <at> gnu.org
Subject: dfa.c and Rational Range Interpretation
Date: Fri, 17 Jan 2014 15:39:48 +0200
Hello All.

I believe that the code in dfa.c that deals with character ranges
is incorrect with respect to Rational Range Interpretation.
This shows up in the following test case:

	$ echo \\ | src/grep -Xawk '[\[-\]]'
	$ 

Whereas with gawk:

	$ echo \\ | gawk '/[\[-\]]/'
	\

From ascii(7):

      ...       133   91    5B    [
      ...       134   92    5C    \  '\\'
      ...       135   93    5D    ]

So gawk is correct here.  (This is on a GLIBC system; in private email
Jim reported different behavior on Mac OS X.)

In the grep master, the code in question is in dfa.c:parse_bracket_exp,
lines 1110 - 1135:

            {
              /* Defer to the system regex library about the meaning
                 of range expressions.  */
              regex_t re;
              char pattern[6] = { '[', 0, '-', 0, ']', 0 };
              char subject[2] = { 0, 0 };
              c1 = c;
              if (case_fold)
                {
                  c1 = tolower (c1);
                  c2 = tolower (c2);
                }

              pattern[1] = c1;
              pattern[3] = c2;
              regcomp (&re, pattern, REG_NOSUB);
              for (c = 0; c < NOTCHAR; ++c)
                {
                  if ((case_fold && isupper (c)))
                    continue;
                  subject[0] = c;
                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
                    setbit_case_fold_c (c, ccl);
                }
              regfree (&re);
            }

This code lets the regex routines decide what characters match
a particular range expression. If the regex routines are not
obeying RRI, then dfa.c will not either.  Yet, grep now supports RRI.

(To me this argues that grep's configure should be checking the
system regex routines for correct RRI support, and automatically
using the included routines if the system routines are not good. Gawk
goes further and simply always uses the included regex routines,
*guaranteeing* consistent behavior across systems. But that's a
parenthetical issue.)

In addition, the call to regcomp could fail, but this isn't being
checked. When I add an error report to the call, I get the following
on one of the gawk test cases:

  "[.c.]" ~ /[a-[.e.]]/ --> 1
  dfa.c:1176: regcomp(/[a-[]/) failed: Invalid range end

Since this relates to [. and .] which dfa and regex don't really
support, there's a gap somewhere, but the point is that if regcomp
fails, nobody notices. What does regexec do if regcomp fails?
Beats me...

Next, let's take a harder look at this:

              for (c = 0; c < NOTCHAR; ++c)
                {
                  if ((case_fold && isupper (c)))
                    continue;
                  subject[0] = c;
                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
                    setbit_case_fold_c (c, ccl);
                }

Since c is 0 on the first iteration, regexec is called with subject
equal to [ '\0' '\0' ].  The first thing regexec does is

	length = strlen(string);

which in this case will be zero. We really want a length of 1 where
the first byte is zero (no arbitrary limits, eh?).  Bug in the regexec
interface, methinks, but in any case, testing 0 is fruitless.

However, this code begs a deeper question. If we're doing RRI, then by
definition only the values between the low member of the range and
the high member of the range can match the range expression.  So why
loop over everything from 0 to 255?

Thus, gawk replaces the above code with the following:

              c1 = c;
              if (case_fold)
                {
                  c1 = tolower (c1);
                  c2 = tolower (c2);
                }
              for (c = c1; c <= c2; c++)
                setbit_case_fold_c (c, ccl);

This sets the bits for exactly those characters in the range. No more,
no less. And it doesn't rely on the system regex routines, which makes
compiling the dfa go faster.

Grep only compiles its dfa once, but gawk can compile arbitrarily many
dfa's, since it can match expressions that are computed dynamically.

I'm not sure if this analysis covers all the problems with the current
code.  But I do think that gawk's code is the correct thing to be
doing for RRI.

Additionally, I recommend that grep's configure check for good RRI
support in the system regex routines and switch to the included ones
if the system ones don't support it.

Finally, the following diff lets grep check the other awk syntax
variants.  Feel free to apply it. For the above test case, all three
give the same results.

I hope all this is of interest.

Thanks!

Arnold
-----------------------------------------------------
diff --git a/src/grep.c b/src/grep.c
index 1b2198f..12644a2 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -19,10 +19,24 @@ Acompile (char const *pattern, size_t size)
   GEAcompile (pattern, size, RE_SYNTAX_AWK);
 }
 
+static void
+GAcompile (char const *pattern, size_t size)
+{
+  GEAcompile (pattern, size, RE_SYNTAX_GNU_AWK);
+}
+
+static void
+PAcompile (char const *pattern, size_t size)
+{
+  GEAcompile (pattern, size, RE_SYNTAX_POSIX_AWK);
+}
+
 struct matcher const matchers[] = {
   { "grep",    Gcompile, EGexecute },
   { "egrep",   Ecompile, EGexecute },
   { "awk",     Acompile, EGexecute },
+  { "gawk",    GAcompile, EGexecute },
+  { "posixawk", PAcompile, EGexecute },
   { "fgrep",   Fcompile, Fexecute },
   { "perl",    Pcompile, Pexecute },
   { NULL, NULL, NULL },




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Fri, 17 Jan 2014 22:44:01 GMT) Full text and rfc822 format available.

Message #8 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Fri, 17 Jan 2014 14:43:29 -0800
Thanks for continuing to bird-dog this.

On 01/17/2014 05:39 AM, Aharon Robbins wrote:

> the following diff lets grep check the other awk syntax
> variants.  Feel free to apply it.

I did that (the first patch enclosed below).
Thanks.

> I do think that gawk's code is the correct thing to be doing for RRI.

I agree, and installed the second patch enclosed below to
implement this.  This patch also includes some documentation
changes -- if you have a bit of time to review them I'd
appreciate it.

Also, I notice that there are a few "#ifdef GREP"s in dfa.c
Do you happen to know why they're needed?  It'd be nice if
we could simplify dfa.c to omit the need for the GREP macro.

> Additionally, I recommend that grep's configure check for good RRI
> support in the system regex routines and switch to the included ones
> if the system ones don't support it.

Unfortunately that'd break support for equivalence classes
and multibyte collation symbols on GNU/Linux platforms, so
it may be a bridge too far.  Until we get glibc fixed, I
think it's OK to live with the situation where [a-z]
ordinarily has the rational range interpretation, and this
breaks down only for complicated matches where the DFA
doesn't suffice; at least it'll work in the usual case.

From c862ced6f31f0ccdf2505ac46e354a1a011149cd Mon Sep 17 00:00:00 2001
From: Aharon Robbins <arnold <at> skeeve.com>
Date: Fri, 17 Jan 2014 12:42:49 -0800
Subject: [PATCH 1/2] grep: add undocumented '-X gawk' and '-X posixawk'
 options

See <http://bugs.gnu.org/16481>.
* src/grep.c (GAcompile, PAcompile): New functions.
(const): Use them.
---
 src/grep.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/grep.c b/src/grep.c
index 1b2198f..12644a2 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -19,10 +19,24 @@ Acompile (char const *pattern, size_t size)
   GEAcompile (pattern, size, RE_SYNTAX_AWK);
 }
 
+static void
+GAcompile (char const *pattern, size_t size)
+{
+  GEAcompile (pattern, size, RE_SYNTAX_GNU_AWK);
+}
+
+static void
+PAcompile (char const *pattern, size_t size)
+{
+  GEAcompile (pattern, size, RE_SYNTAX_POSIX_AWK);
+}
+
 struct matcher const matchers[] = {
   { "grep",    Gcompile, EGexecute },
   { "egrep",   Ecompile, EGexecute },
   { "awk",     Acompile, EGexecute },
+  { "gawk",    GAcompile, EGexecute },
+  { "posixawk", PAcompile, EGexecute },
   { "fgrep",   Fcompile, Fexecute },
   { "perl",    Pcompile, Pexecute },
   { NULL, NULL, NULL },
-- 
1.8.4.2


From aba2c718908d6c8fcfd75d55a43a4c9b1e3405a3 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Fri, 17 Jan 2014 14:32:10 -0800
Subject: [PATCH 2/2] grep: DFA now uses rational ranges in unibyte locales

Problem reported by Aharon Robbins in <http://bugs.gnu.org/16481>.
* NEWS:
* doc/grep.texi (Environment Variables)
(Character Classes and Bracket Expressions):
Document this.
* src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte.
---
 NEWS          |  8 ++++++++
 doc/grep.texi | 19 +++++++++----------
 src/dfa.c     | 20 ++------------------
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/NEWS b/NEWS
index 6e46684..589b2ac 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,14 @@ GNU grep NEWS                                    -*- outline -*-
   grep -i in a multibyte locale is now typically 10 times faster
   for patterns that do not contain \ or [.
 
+  Range expressions in unibyte locales now ordinarily use the rational
+  range interpretation, in which [a-z] matches only lower-case ASCII
+  letters regardless of locale, and similarly for other ranges.  (This
+  was already true for multibyte locales.)  Portable programs should
+  continue to specify the C locale when using range expressions, since
+  these expressions have unspecified behavior in non-GNU systems and
+  are not yet guaranteed to use the rational range interpretation even
+  in GNU systems.
 
 * Noteworthy changes in release 2.16 (2014-01-01) [stable]
 
diff --git a/doc/grep.texi b/doc/grep.texi
index 473a181..42fb9a2 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -960,8 +960,8 @@ They are omitted (i.e., false) by default and become true when specified.
 @cindex national language support
 @cindex NLS
 These variables specify the locale for the @code{LC_COLLATE} category,
-which determines the collating sequence
-used to interpret range expressions like @samp{[a-z]}.
+which might affect how range expressions like @samp{[a-z]} are
+interpreted.
 
 @item LC_ALL
 @itemx LC_CTYPE
@@ -1223,14 +1223,13 @@ For example, the regular expression
 Within a bracket expression, a @dfn{range expression} consists of two
 characters separated by a hyphen.
 It matches any single character that
-sorts between the two characters, inclusive, using the locale's
-collating sequence and character set.
-For example, in the default C
-locale, @samp{[a-d]} is equivalent to @samp{[abcd]}.
-Many locales sort
-characters in dictionary order, and in these locales @samp{[a-d]} is
-typically not equivalent to @samp{[abcd]};
-it might be equivalent to @samp{[aBbCcDd]}, for example.
+sorts between the two characters, inclusive.
+In the default C locale, the sorting sequence is the native character
+order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
+In other locales, the sorting sequence is not specified, and
+@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
+@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
+characters that it matches might even be erratic.
 To obtain the traditional interpretation
 of bracket expressions, you can use the @samp{C} locale by setting the
 @env{LC_ALL} environment variable to the value @samp{C}.
diff --git a/src/dfa.c b/src/dfa.c
index 6ab4e05..5e3140d 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1108,30 +1108,14 @@ parse_bracket_exp (void)
             }
           else
             {
-              /* Defer to the system regex library about the meaning
-                 of range expressions.  */
-              regex_t re;
-              char pattern[6] = { '[', 0, '-', 0, ']', 0 };
-              char subject[2] = { 0, 0 };
               c1 = c;
               if (case_fold)
                 {
                   c1 = tolower (c1);
                   c2 = tolower (c2);
                 }
-
-              pattern[1] = c1;
-              pattern[3] = c2;
-              regcomp (&re, pattern, REG_NOSUB);
-              for (c = 0; c < NOTCHAR; ++c)
-                {
-                  if ((case_fold && isupper (c)))
-                    continue;
-                  subject[0] = c;
-                  if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
-                    setbit_case_fold_c (c, ccl);
-                }
-              regfree (&re);
+              for (c = c1; c <= c2; c++)
+                setbit_case_fold_c (c, ccl);
             }
 
           colon_warning_state |= 8;
-- 
1.8.4.2






Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Sat, 18 Jan 2014 19:40:03 GMT) Full text and rfc822 format available.

Message #11 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: eggert <at> cs.ucla.edu, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sat, 18 Jan 2014 21:39:13 +0200
Hi Paul.

> Thanks for continuing to bird-dog this.

It's either "tenacity" or "stubborness". :-)

> > I do think that gawk's code is the correct thing to be doing for RRI.
>
> I agree, and installed the second patch enclosed below to
> implement this.

Cool!  Hurray!  One more bit that comes into sync.

> This patch also includes some documentation
> changes -- if you have a bit of time to review them I'd
> appreciate it.

It looks ok, but it doesn't really say anything about RRI - grep
does RRI in all locales now, which falls under the umbrella
of POSIXy implementation-defined behavior, but is just fine.
That should be explained.

> Also, I notice that there are a few "#ifdef GREP"s in dfa.c
> Do you happen to know why they're needed?

No idea.  They all seem to be related to case_fold.  I had
not really noticed them, and they must be working fine for me
since I don't define GREP.

What happens if you compile them in and run the grep test suite?

> > Additionally, I recommend that grep's configure check for good RRI
> > support in the system regex routines and switch to the included ones
> > if the system ones don't support it.
>
> Unfortunately that'd break support for equivalence classes
> and multibyte collation symbols on GNU/Linux platforms, so
> it may be a bridge too far.

Gawk has lived without these so far. :-)

> Until we get glibc fixed, I
> think it's OK to live with the situation where [a-z]
> ordinarily has the rational range interpretation, and this
> breaks down only for complicated matches where the DFA
> doesn't suffice; at least it'll work in the usual case.

At least document it somewhere.

Thanks!

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 20 Jan 2014 17:36:02 GMT) Full text and rfc822 format available.

Message #14 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 20 Jan 2014 09:35:43 -0800
[Message part 1 (text/plain, inline)]
Aharon Robbins wrote:

> What happens if you compile them in and run the grep test suite?

The test suite passes, but grep is bigger and (I presume) slower.  The 
GREP-related changes are for performance, and shouldn't affect behavior.

How about if we apply the attached patch to dfa.c, in both gawk and 
grep?  I tried it just now, and gawk passed all its tests too.  Or, if 
there's some reason this patch would introduce a bug into gawk, I'd like 
to fix the grep test cases to detect the bug.
[gawk.diff (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 21 Jan 2014 04:23:01 GMT) Full text and rfc822 format available.

Message #17 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: eggert <at> cs.ucla.edu, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 21 Jan 2014 06:21:59 +0200
Hi Paul.

> > What happens if you compile them in and run the grep test suite?
>
> The test suite passes, but grep is bigger and (I presume) slower.  The 
> GREP-related changes are for performance, and shouldn't affect behavior.
>
> How about if we apply the attached patch to dfa.c, in both gawk and 
> grep?  I tried it just now, and gawk passed all its tests too.  Or, if 
> there's some reason this patch would introduce a bug into gawk, I'd like 
> to fix the grep test cases to detect the bug.

Can you explain a bit more what the two different branches do?

In other words, I'm wondering why there are two different branches through
the code in the first place, and what are we throwing away by your patch?

(I have no preference either way, I just want to understand the
implications of the decision. :-)

Thanks,

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 21 Jan 2014 06:03:02 GMT) Full text and rfc822 format available.

Message #20 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 20 Jan 2014 22:02:12 -0800
Aharon Robbins wrote:
> Can you explain a bit more what the two different branches do?

Sorry, not easily; I'm not familiar with the code. I assume it has 
something to do with locales where there's not a one-to-one 
correspondence between lower-case and upper-case letters.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 21 Jan 2014 16:52:01 GMT) Full text and rfc822 format available.

Message #23 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 21 Jan 2014 08:50:44 -0800
On Mon, Jan 20, 2014 at 10:02 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Aharon Robbins wrote:
>>
>> Can you explain a bit more what the two different branches do?
>
> Sorry, not easily; I'm not familiar with the code. I assume it has something
> to do with locales where there's not a one-to-one correspondence between
> lower-case and upper-case letters.

Hi Paul,

A week or so, Norihiro Tanaka posted the patch in bug 16421, which
removes GREP-oriented dfa.c code in favor of what gawk has been using,
as well as ensuring that some of grep's case-insensitive searches no
longer have to case-convert the data being searched.  I was expecting
to apply it, along with another small change and a test, but now, feel
like I'll have to justify it with some performance data as well.
Assuming I find an improvement, expect a complete patch in a day or
two.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 21 Jan 2014 21:51:02 GMT) Full text and rfc822 format available.

Message #26 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org,
 16421 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 21 Jan 2014 13:50:26 -0800
On 01/21/2014 08:50 AM, Jim Meyering wrote:
> I was expecting
> to apply it, along with another small change and a test, but now, feel
> like I'll have to justify it with some performance data as well.

Ouch, I wasn't intending to make work for you!  Even if the patch in 
<http://bugs.gnu.org/16481#14> didn't improve performance, it makes grep 
simpler and that should be a win.  Norihiro Tanaka's patch (which I'd 
forgotten about, but which is presumably better) also simplifies grep, 
so you shouldn't need to do a performance analysis to verify that it's a 
good idea.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Sat, 25 Jan 2014 18:28:01 GMT) Full text and rfc822 format available.

Message #29 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: eggert <at> cs.ucla.edu, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sat, 25 Jan 2014 20:27:13 +0200
Hi Paul & Jim,

> > What happens if you compile them in and run the grep test suite?
>
> The test suite passes, but grep is bigger and (I presume) slower.  The 
> GREP-related changes are for performance, and shouldn't affect behavior.
>
> How about if we apply the attached patch to dfa.c, in both gawk and 
> grep?  I tried it just now, and gawk passed all its tests too.  Or, if 
> there's some reason this patch would introduce a bug into gawk, I'd like 
> to fix the grep test cases to detect the bug.

The code in question occurs in two functions, parse_bracket_exp() and atom().

The first instance is in parse_bracket_exp(), building a range expression,
where we may have multibyte characters.

      ....
      if (c1 == '-' && c2 != ']')
        {
          if (c2 == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
            FETCH_WC (c2, wc2, _("unbalanced ["));

          if (MB_CUR_MAX > 1)
            {
              /* When case folding map a range, say [m-z] (or even [M-z])
                 to the pair of ranges, [m-z] [M-Z].  */
              REALLOC_IF_NECESSARY (work_mbc->range_sts,
                                    range_sts_al, work_mbc->nranges + 1);
              REALLOC_IF_NECESSARY (work_mbc->range_ends,
                                    range_ends_al, work_mbc->nranges + 1);
              work_mbc->range_sts[work_mbc->nranges] =
                case_fold ? towlower (wc) : (wchar_t) wc;
              work_mbc->range_ends[work_mbc->nranges++] =
                case_fold ? towlower (wc2) : (wchar_t) wc2;

#ifndef GREP
              if (case_fold && (iswalpha (wc) || iswalpha (wc2)))
                {
                  REALLOC_IF_NECESSARY (work_mbc->range_sts,
                                        range_sts_al, work_mbc->nranges + 1);
                  work_mbc->range_sts[work_mbc->nranges] = towupper (wc);
                  REALLOC_IF_NECESSARY (work_mbc->range_ends,
                                        range_ends_al, work_mbc->nranges + 1);
                  work_mbc->range_ends[work_mbc->nranges++] = towupper (wc2);
                }
#endif
            }

To me this looks like when doing case folding (grep -i, IGNORECASE in gawk),
we turn the m.b. equivalent of [a-c] into [a-cA-C].  This would seem to be
necessary for correctness, and the question is why does grep not need it?

The next such bit is later on in the same function:

      if (case_fold && iswalpha (wc))
        {
          wc = towlower (wc);
          if (!setbit_wc (wc, ccl))
            {
              REALLOC_IF_NECESSARY (work_mbc->chars, chars_al,
                                    work_mbc->nchars + 1);
              work_mbc->chars[work_mbc->nchars++] = wc;
            }
#ifdef GREP
          continue;
#else
          wc = towupper (wc);
#endif
        }
      if (!setbit_wc (wc, ccl))
        {
          REALLOC_IF_NECESSARY (work_mbc->chars, chars_al,
                                work_mbc->nchars + 1);
          work_mbc->chars[work_mbc->nchars++] = wc;
        }
    }
  while ((wc = wc1, (c = c1) != ']'));

This too looks related to case folding and ranges; if I read it
correctly, when case folding it added the lower case version and
now it has to add the uppercase version of the charcter.

Then, in atom():  (Why the bizarre leading `if (0)'?)

static void
atom (void)
{
  if (0)
    {
      /* empty */
    }
  else if (MBS_SUPPORT && tok == WCHAR)
    {
      addtok_wc (case_fold ? towlower (wctok) : wctok);
#ifndef GREP
      if (case_fold && iswalpha (wctok))
        {
          addtok_wc (towupper (wctok));
          addtok (OR);
        }
#endif

      tok = lex ();
    }

Here too, we're doing case folding, have added the lower case character
and need to add the upper case one.

I think to test out this code you'd need a character set where the lower
and upper case counterparts are multibyte characters and grep -i is
in effect.  But I suspect that grep has so much other code to special
case grep -i that this code in dfa.c is never reached.

In short, I don't think it's right to remove this code, but I don't
know how to test it to prove that, either.

HTH,

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Sat, 25 Jan 2014 18:57:02 GMT) Full text and rfc822 format available.

Message #32 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sat, 25 Jan 2014 10:56:29 -0800
Aharon Robbins wrote:
> I don't think it's right to remove this code, but I don't know how to test it to prove that, either.

Perhaps Norihiro Tanaka's recent patch makes this question moot; see:

http://bugs.gnu.org/16421




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Sat, 25 Jan 2014 19:26:01 GMT) Full text and rfc822 format available.

Message #35 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: eggert <at> cs.ucla.edu, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sat, 25 Jan 2014 21:24:59 +0200
Hi.

> Date: Sat, 25 Jan 2014 10:56:29 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> To: Aharon Robbins <arnold <at> skeeve.com>, 16481 <at> debbugs.gnu.org
> Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
>
> Aharon Robbins wrote:
> > I don't think it's right to remove this code, but I don't know how
> > to test it to prove that, either.
>
> Perhaps Norihiro Tanaka's recent patch makes this question moot; see:
>
> http://bugs.gnu.org/16421

Yes, I think so.  It keeps the non-GREP code.  If y'all are going to apply
it then I'd be happy with that.

Thanks!

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Sun, 09 Feb 2014 23:19:02 GMT) Full text and rfc822 format available.

Message #38 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Aharon Robbins <arnold <at> skeeve.com>, 
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 00:18:40 +0100
Il 17/01/2014 23:43, Paul Eggert ha scritto:
>> > I do think that gawk's code is the correct thing to be doing for RRI.
> I agree, and installed the second patch enclosed below to
> implement this.  This patch also includes some documentation
> changes -- if you have a bit of time to review them I'd
> appreciate it.

Please revert commit 1078b64302bbf5c0a46635772808ff7f75171dbc.

The correct course of action for grep is to defer range interpretation 
to regex, because otherwise you can get mismatches between regexes with 
backreferences and those without.

For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing 
result that the first regex won't match a superset of the language 
described by the second regex.

For this reason, if you want to have RRI, then you need to make sure 
that you compile --with-included-regex.

Paolo




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 02:36:02 GMT) Full text and rfc822 format available.

Message #41 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Paolo Bonzini <bonzini <at> gnu.org>, Aharon Robbins <arnold <at> skeeve.com>, 
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sun, 09 Feb 2014 18:35:50 -0800
Paolo Bonzini wrote:
> The correct course of action for grep is to defer range interpretation
> to regex, because otherwise you can get mismatches between regexes with
> backreferences and those without.

It depends on what one means by "correct".  POSIX doesn't say what to do 
in this situation, so it's OK as far as POSIX is concerned for grep to 
use RRI in the typical case (i.e., without backreferences), and for grep 
to use some other interpretation in the rare cases when backreferences 
are used.

The documentation for 'grep' attempts to address this issue, perhaps not 
as clearly as it could.  Maybe the installation instructions should talk 
about it as well, and suggest --with-included-regex for people who care 
about this sort of thing.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 03:15:03 GMT) Full text and rfc822 format available.

Message #44 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Paolo Bonzini <bonzini <at> gnu.org>, Aharon Robbins <arnold <at> skeeve.com>,
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sun, 9 Feb 2014 19:13:40 -0800
On Sun, Feb 9, 2014 at 6:35 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Paolo Bonzini wrote:
>>
>> The correct course of action for grep is to defer range interpretation
>> to regex, because otherwise you can get mismatches between regexes with
>> backreferences and those without.
>
>
> It depends on what one means by "correct".  POSIX doesn't say what to do in
> this situation, so it's OK as far as POSIX is concerned for grep to use RRI
> in the typical case (i.e., without backreferences), and for grep to use some
> other interpretation in the rare cases when backreferences are used.
>
> The documentation for 'grep' attempts to address this issue, perhaps not as
> clearly as it could.  Maybe the installation instructions should talk about
> it as well, and suggest --with-included-regex for people who care about this
> sort of thing.

Has anyone looked at making glibc's regex use RRI?




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 08:12:02 GMT) Full text and rfc822 format available.

Message #47 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Aharon Robbins <arnold <at> skeeve.com>, 
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 09:11:21 +0100
Il 10/02/2014 03:35, Paul Eggert ha scritto:
> Paolo Bonzini wrote:
>> The correct course of action for grep is to defer range interpretation
>> to regex, because otherwise you can get mismatches between regexes with
>> backreferences and those without.
>
> It depends on what one means by "correct".  POSIX doesn't say what to do
> in this situation, so it's OK as far as POSIX is concerned for grep to
> use RRI in the typical case (i.e., without backreferences), and for grep
> to use some other interpretation in the rare cases when backreferences
> are used.
>
> The documentation for 'grep' attempts to address this issue, perhaps not
> as clearly as it could.  Maybe the installation instructions should talk
> about it as well, and suggest --with-included-regex for people who care
> about this sort of thing.

Yeah, that makes sense.  I will revert the commit.

Paolo





Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 09:01:02 GMT) Full text and rfc822 format available.

Message #50 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: arnold <at> skeeve.com
To: eggert <at> cs.ucla.edu, bonzini <at> gnu.org, arnold <at> skeeve.com,
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 02:00:07 -0700
Paolo Bonzini <bonzini <at> gnu.org> wrote:

> Il 10/02/2014 03:35, Paul Eggert ha scritto:
> > Paolo Bonzini wrote:
> >> The correct course of action for grep is to defer range interpretation
> >> to regex, because otherwise you can get mismatches between regexes with
> >> backreferences and those without.
> >
> > It depends on what one means by "correct".  POSIX doesn't say what to do
> > in this situation, so it's OK as far as POSIX is concerned for grep to
> > use RRI in the typical case (i.e., without backreferences), and for grep
> > to use some other interpretation in the rare cases when backreferences
> > are used.
> >
> > The documentation for 'grep' attempts to address this issue, perhaps not
> > as clearly as it could.  Maybe the installation instructions should talk
> > about it as well, and suggest --with-included-regex for people who care
> > about this sort of thing.
>
> Yeah, that makes sense.  I will revert the commit.

I think this is the wrong course of action. Paul suggested updating the
doc to be more clear, not reverting the code.

Personally, I think grep should always use the included regex so that
then the behavior is consistent across all platforms everywhere; this
is why gawk always uses its own regex.

If the only way to use collating sequences and equivalence classes is
with GLIBC, then I think it'd be better to pull the __LIBC bits out into
the standalone regex somehow.

In reponse to another question: Making GLIBC's regex support RRI isn't
hard - getting the GLIBC maintainers to accept the patch, is. :-(

My two cents: Jim & Paul will have to decide.

Thanks,

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 09:19:02 GMT) Full text and rfc822 format available.

Message #53 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: arnold <at> skeeve.com, eggert <at> cs.ucla.edu, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 10:18:16 +0100
Il 10/02/2014 10:00, arnold <at> skeeve.com ha scritto:
>>> > >
>>> > > The documentation for 'grep' attempts to address this issue, perhaps not
>>> > > as clearly as it could.  Maybe the installation instructions should talk
>>> > > about it as well, and suggest --with-included-regex for people who care
>>> > > about this sort of thing.
>> >
>> > Yeah, that makes sense.  I will revert the commit.
> I think this is the wrong course of action. Paul suggested updating the
> doc to be more clear, not reverting the code.

If you use --with-included-regex, the patch is a no-op.  Thus it can be 
reverted.

> Personally, I think grep should always use the included regex so that
> then the behavior is consistent across all platforms everywhere; this
> is why gawk always uses its own regex.

I wouldn't be surprised if GNU distros patch gawk's regex away to get 
consistency with grep, sed, etc.

Paolo




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 10:55:02 GMT) Full text and rfc822 format available.

Message #56 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: arnold <at> skeeve.com
To: eggert <at> cs.ucla.edu, bonzini <at> gnu.org, arnold <at> skeeve.com,
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 03:53:58 -0700
> If you use --with-included-regex, the patch is a no-op.  Thus it can be 
> reverted.

Whether or not you use the included regex, there are still problems, if not
outright bugs, in that code that I pointed out in my initial mail several
weeks ago.

> > Personally, I think grep should always use the included regex so that
> > then the behavior is consistent across all platforms everywhere; this
> > is why gawk always uses its own regex.
>
> I wouldn't be surprised if GNU distros patch gawk's regex away to get 
> consistency with grep, sed, etc.

Their loss. To date I know of no distro that does this. And the world is
bigger than just GNU syystems.

We've gone around on this before and we continue to disagree. I have
nothing else to add to this discussion.

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 19:51:01 GMT) Full text and rfc822 format available.

Message #59 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Paolo Bonzini <bonzini <at> gnu.org>, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 11:50:07 -0800
On 02/10/2014 01:18 AM, Paolo Bonzini wrote:
>
> If you use --with-included-regex, the patch is a no-op.

Are we talking about the patch in git commit 
1078b64302bbf5c0a46635772808ff7f75171dbc 
<http://git.savannah.gnu.org/cgit/grep.git/commit/?id=1078b64302bbf5c0a46635772808ff7f75171dbc>?

If so, then the above comment doesn't sound right.  Without the patch, 
the DFA matcher mishandles expressionsin some cases, as described in 
Bug#16481.  For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try 
to compile the regular expression [[-]], which won't workregardless of 
whether --with-included-regex is being used.

More generally, we already had the problem of subtle differences between 
dfa.c and full-regexp matching on platforms that do not observe RRI, 
because dfa.c already uses RRI in multibyte locales, regardless of 
whether the full matcher uses RRI.  The change causes non-"C" unibyte 
locales to behave consistently with multibyte locales, which in some 
sense is an improvement (though obviously not ideal; it'd be better if 
it was RRI everywhere).

Non-"C" unibyte locales are dying out, so to some extent this is a minor 
issue.  In practice most users these days won't notice or care about 
this change.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 10 Feb 2014 22:14:02 GMT) Full text and rfc822 format available.

Message #62 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 23:13:42 +0100
Il 10/02/2014 20:50, Paul Eggert ha scritto:
>
> If so, then the above comment doesn't sound right.  Without the patch,
> the DFA matcher mishandles expressionsin some cases, as described in
> Bug#16481.  For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try
> to compile the regular expression [[-]], which won't workregardless of
> whether --with-included-regex is being used.

Ok, so there is a real bug.  But it is not immediately obvious what the 
problem is, and the bug has (AFAICS) no test case and no mention in the 
commit message.  Without this, I am not sure that the fix should not be 
the one in this commit.

> More generally, we already had the problem of subtle differences between
> dfa.c and full-regexp matching on platforms that do not observe RRI,
> because dfa.c already uses RRI in multibyte locales, regardless of
> whether the full matcher uses RRI.

It only does so if the fallback to regex is not requested (dfaexec 
invoked with backref = NULL).  This is never the case for grep.  In 
fact, as far as I know it is never the case, and I've been tempted many 
times to completely remove the mostly dead code dealing with multibyte 
ranges if backref = NULL.

> The change causes non-"C" unibyte
> locales to behave consistently with multibyte locales, which in some
> sense is an improvement (though obviously not ideal; it'd be better if
> it was RRI everywhere).

It would be if glibc were fixed.  For me, consistency with other GNU 
utilities---especially sed---trumps anything else, and this was the main 
point in fixing multibyte matching in GNU grep 2.6 and newer.

> Non-"C" unibyte locales are dying out, so to some extent this is a minor
> issue.  In practice most users these days won't notice or care about
> this change.

That's true.

Paolo




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 11 Feb 2014 21:43:01 GMT) Full text and rfc822 format available.

Message #65 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Paolo Bonzini <bonzini <at> gnu.org>, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 11 Feb 2014 13:42:04 -0800
On 02/10/2014 02:13 PM, Paolo Bonzini wrote:
> Ok, so there is a real bug.  But it is not immediately obvious what 
> the problem is, and the bug has (AFAICS) no test case and no mention 
> in the commit message.  Without this, I am not sure that the fix 
> should not be the one in this commit.
You're right, it should have had a test case.I'll add this to my to-do list.

> It only does so if the fallback to regex is not requested (dfaexec 
> invoked with backref = NULL).  This is never the case for grep. In 
> fact, as far as I know it is never the case, and I've been tempted 
> many times to completely remove the mostly dead code dealing with 
> multibyte ranges if backref = NULL.
>

Ouch, I wasn't aware of this.  Clearly the patch I put in was wrong -- 
at least for the documentation that got put into NEWS.

Perhaps you're right, and the best thing to do for now is to revert the 
patch while we can think about a better solution. This should be done 
soon, since Jim wants to do a grep release. Please let me think about it 
for a day or two.  I would like to fix the bug, anyway, even if that 
patch wasn't the right way to do it.  Longer term, it'd be better to 
simplify the code (perhaps along the lines that you suggested) as it's 
too full of gotchas now.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Tue, 11 Feb 2014 21:45:02 GMT) Full text and rfc822 format available.

Message #68 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Tue, 11 Feb 2014 22:44:09 +0100
Il 11/02/2014 22:42, Paul Eggert ha scritto:
> Ouch, I wasn't aware of this.  Clearly the patch I put in was wrong --
> at least for the documentation that got put into NEWS.

Yeah, sorry for not spelling it out entirely.  I worked on grep in 
bursts, and as a result I tend to take too many things for granted.

> Perhaps you're right, and the best thing to do for now is to revert the
> patch while we can think about a better solution. This should be done
> soon, since Jim wants to do a grep release. Please let me think about it
> for a day or two.  I would like to fix the bug, anyway, even if that
> patch wasn't the right way to do it.  Longer term, it'd be better to
> simplify the code (perhaps along the lines that you suggested) as it's
> too full of gotchas now.

I 100% agree with this.  If I don't hear from you I'll revert the patch 
next Friday.

Paolo




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 17 Feb 2014 04:46:02 GMT) Full text and rfc822 format available.

Message #71 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paolo Bonzini <bonzini <at> gnu.org>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Aharon Robbins <arnold <at> skeeve.com>,
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Sun, 16 Feb 2014 20:44:53 -0800
On Tue, Feb 11, 2014 at 1:44 PM, Paolo Bonzini <bonzini <at> gnu.org> wrote:
> Il 11/02/2014 22:42, Paul Eggert ha scritto:
>
>> Ouch, I wasn't aware of this.  Clearly the patch I put in was wrong --
>> at least for the documentation that got put into NEWS.
>
>
> Yeah, sorry for not spelling it out entirely.  I worked on grep in bursts,
> and as a result I tend to take too many things for granted.
>
>
>> Perhaps you're right, and the best thing to do for now is to revert the
>> patch while we can think about a better solution. This should be done
>> soon, since Jim wants to do a grep release. Please let me think about it
>> for a day or two.  I would like to fix the bug, anyway, even if that
>> patch wasn't the right way to do it.  Longer term, it'd be better to
>> simplify the code (perhaps along the lines that you suggested) as it's
>> too full of gotchas now.
>
>
> I 100% agree with this.  If I don't hear from you I'll revert the patch next
> Friday.

Hi guys,

I confess that I do not feel strongly about this corner case, but
do want to make a release very soon.  Paolo, Paul, where do you stand?

I would like to make the release by Monday evening.




Information forwarded to bug-grep <at> gnu.org:
bug#16481; Package grep. (Mon, 17 Feb 2014 07:31:01 GMT) Full text and rfc822 format available.

Message #74 received at 16481 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Aharon Robbins <arnold <at> skeeve.com>,
 16481 <at> debbugs.gnu.org
Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 17 Feb 2014 08:29:47 +0100
Il 17/02/2014 05:44, Jim Meyering ha scritto:
>> > I 100% agree with this.  If I don't hear from you I'll revert the patch next
>> > Friday.
> Hi guys,
>
> I confess that I do not feel strongly about this corner case, but
> do want to make a release very soon.  Paolo, Paul, where do you stand?
>
> I would like to make the release by Monday evening.

I'll revert the patch today.

Paolo





Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 09 Mar 2014 20:08:01 GMT) Full text and rfc822 format available.

Notification sent to Aharon Robbins <arnold <at> skeeve.com>:
bug acknowledged by developer. (Sun, 09 Mar 2014 20:08:02 GMT) Full text and rfc822 format available.

Message #79 received at 16481-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 16481-done <at> debbugs.gnu.org
Subject: Re:  dfa.c and Rational Range Interpretation
Date: Sun, 09 Mar 2014 13:07:12 -0700
It seems that the issues in this bug report are all done in the savannah 
git master for grep, so I'm marking this as done.

At some point I'd like to change the regex code to support RRI, at which 
point the dfa.c should now automatically adapt without our having to 
change dfa.c further.  But that would be a matter for a gnulib and/or 
glibc bug report, not dfa and/or grep.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 07 Apr 2014 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 47 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.