GNU bug report logs - #18817
\w is not synonym for [[:alnum:]] in UTF-8 locales

Previous Next

Package: grep;

Reported by: Jaroslav Skarvada <jskarvad <at> redhat.com>

Date: Fri, 24 Oct 2014 14:21:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18817 in the body.
You can then email your comments to 18817 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Fri, 24 Oct 2014 14:21:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jaroslav Skarvada <jskarvad <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 24 Oct 2014 14:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jaroslav Skarvada <jskarvad <at> redhat.com>
To: bug-grep <at> gnu.org
Subject: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Fri, 24 Oct 2014 10:19:49 -0400 (EDT)
Hi,

in the man page there is the following sentence:

"The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]]"

Not counting that in man pages for some other languages (e.g. czech) there is written
that \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]], but
none of them seems to be synonym for \w | \W in UTF-8 locales:

$ export LANG=en_US.UTF-8

$ echo 'á' | grep '[[:alnum:]]'
á
$ echo 'á' | grep '[_[:alnum:]]'
á
$ echo 'á' | grep '\w'

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '[^_[:alnum:]]'
$ echo 'á' | grep '\W'
á

$ grep --version
grep (GNU grep) 2.20
...




Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Fri, 24 Oct 2014 17:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 18817 <at> debbugs.gnu.org
Subject: Re: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Sat, 25 Oct 2014 02:26:34 +0900
[Message part 1 (text/plain, inline)]
As \w and \W have been supported in single byte locales only, I added
support in multibyte locales.
[0001-dfa-support-for-w-and-W-in-multibyte-locale.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Fri, 24 Oct 2014 21:05:01 GMT) Full text and rfc822 format available.

Message #11 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 18817 <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Fri, 24 Oct 2014 15:04:25 -0600
[Message part 1 (text/plain, inline)]
On 10/24/2014 11:26 AM, Norihiro Tanaka wrote:
> As \w and \W have been supported in single byte locales only, I added
> support in multibyte locales.
> 

> +
> +          /* \w and \W are documented to be equivalent to [_[:album:]] and
> +             [^_[:alnum:]] respectively, so tell the lexer to process those
> +             strings, each minus its "already processed" '['.  */

s/album/alnum/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Fri, 24 Oct 2014 23:12:01 GMT) Full text and rfc822 format available.

Message #14 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Eric Blake <eblake <at> redhat.com>
Cc: 18817 <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Sat, 25 Oct 2014 08:11:02 +0900
[Message part 1 (text/plain, inline)]
Eric Blake <eblake <at> redhat.com> wrote:
> s/album/alnum/

Wow, typo!  Thanks, I fixed it and added reporter in commit log.
[0001-dfa-support-for-w-and-W-in-multibyte-locale.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Sat, 25 Oct 2014 18:31:02 GMT) Full text and rfc822 format available.

Message #17 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Eric Blake <eblake <at> redhat.com>, 18817 <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Sat, 25 Oct 2014 11:30:07 -0700
This looks like a fine change.  Thank you.  Please also add a NEWS entry.

On Fri, Oct 24, 2014 at 4:11 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Eric Blake <eblake <at> redhat.com> wrote:
>> s/album/alnum/
>
> Wow, typo!  Thanks, I fixed it and added reporter in commit log.




Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Sun, 26 Oct 2014 00:29:02 GMT) Full text and rfc822 format available.

Message #20 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Eric Blake <eblake <at> redhat.com>, 18817 <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Sun, 26 Oct 2014 09:27:52 +0900
[Message part 1 (text/plain, inline)]
Jim Meyering <jim <at> meyering.net> wrote:
> This looks like a fine change.  Thank you.  Please also add a NEWS entry.

Thanks for the review.  I added NEWS entry to the patch.
[0001-dfa-support-for-w-and-W-in-multibyte-locale.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Wed, 29 Oct 2014 01:08:02 GMT) Full text and rfc822 format available.

Message #23 received at 18817 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Eric Blake <eblake <at> redhat.com>, 18817 <18817 <at> debbugs.gnu.org>
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Tue, 28 Oct 2014 18:07:26 -0700
[Message part 1 (text/plain, inline)]
I've adjusted the commit subject and ChangeLog content, and will push
this today, then I'll make a pre-release snapshot.
[0001-dfa-make-w-and-W-work-in-multibyte-locales.patch (application/octet-stream, attachment)]

Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Wed, 29 Oct 2014 03:56:02 GMT) Full text and rfc822 format available.

Notification sent to Jaroslav Skarvada <jskarvad <at> redhat.com>:
bug acknowledged by developer. (Wed, 29 Oct 2014 03:56:03 GMT) Full text and rfc822 format available.

Message #28 received at 18817-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Eric Blake <eblake <at> redhat.com>, 18817-done <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Tue, 28 Oct 2014 20:55:07 -0700
FYI, I noticed only after pushing that "make check" was
failing a test because that new script was not executable,
so I've just pushed a follow-up patch to fix that.

On Tue, Oct 28, 2014 at 6:07 PM, Jim Meyering <jim <at> meyering.net> wrote:
> I've adjusted the commit subject and ChangeLog content, and will push
> this today, then I'll make a pre-release snapshot.




Information forwarded to bug-grep <at> gnu.org:
bug#18817; Package grep. (Wed, 29 Oct 2014 14:23:02 GMT) Full text and rfc822 format available.

Message #31 received at 18817-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Eric Blake <eblake <at> redhat.com>, 18817-done <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Wed, 29 Oct 2014 23:22:09 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> FYI, I noticed only after pushing that "make check" was
> failing a test because that new script was not executable,
> so I've just pushed a follow-up patch to fix that.

Sorry, thanks for catching.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 27 Nov 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.