GNU bug report logs - #15199
UTF-16 surrogate pair handling in grep -i option

Previous Next

Package: grep;

Reported by: Paolo Bonzini <bonzini <at> gnu.org>

Date: Tue, 27 Aug 2013 15:54:01 UTC

Severity: normal

Tags: moreinfo

Merged with 15192

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15199 in the body.
You can then email your comments to 15199 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#15199; Package grep. (Tue, 27 Aug 2013 15:54:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paolo Bonzini <bonzini <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Tue, 27 Aug 2013 15:54:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Corinna Vinschen <vinschen <at> redhat.com>
Cc: bug-grep <at> gnu.org
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 27 Aug 2013 17:53:25 +0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 20/08/2013 17:11, Corinna Vinschen ha scritto:
> That's what I did when I started to write this patch, but then I 
> decided against it for the following reason:
> 
> The implementation of mbrtowc, wcrtomb and towlower using UTF-16 
> wchar_t works *only* in the Cygwin/Newlib-provided functions in 
> exactly the way used in this patch.  I'm not aware that any other 
> platform provides an equivalent implementation, even if wchar_t is 
> 2 bytes.  Thus, the assumption that the code works in all cases in 
> which sizeof (wchar_t) == 2, is wrong.  It would, for instance,
> not work with the Windows implementation of wcrtomb, AFAIK.

Right, MSVCRT is exactly what I was thinking about.

> I'm not strongly opposed to changing this, but IMHO, to be on the 
> safe side, this code should only be activated on a case by case 
> basis, so only for Cygwin for now.  Same with a potential fix to 
> the regex compiler, for which I have no idea how to do it, yet :(

Feel free to bug me on IRC if I can be of any help.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSHMt1AAoJEBvWZb6bTYbySD8P/3vFn0FEGTQDpoHkUK0aysNH
ShyBFQ9AywNr0WYCWc+lg7uq9NpjNnonXtawOvoq+RYCNCqPJ16/fjqDe2bFGrR6
qifSuVQioK2D//r1Y7FfMANn1lzFfiBrhIpDBT/bLP/1i57VjbivZACgFdLnyTTN
olN9uNIl0EebVLkMdiF363DWP7ZmJh8pfi8C1cW0VeTT77kgYTRppFaQfuY9K1SA
2bQj8hzKqyzJkXkHTPow5cvby9moZ/wKSjjduYXxpNNRvn9KGY67E7nv/s/FDxHq
R6KzttHCCWVprlHCE2laykQY4sawpkMkEMoIYWjXIyuw6q7/DiLPxY3AnwE8PMLR
u0Vv1SDLbvCiCx+FZgCrChP3lXojKqi1QNyYdcwgBLracYNw4Z5ASatol7yYKJJW
IozVn4iWkp4sK/lZlOmWykNdNzA9iLTTrw4BHdCxBBxtSl0/jjaTCzXp6QcVXYhe
2Ey6RHikOkF3Gn01CuaAvqv06oJYFnBROw+zimb4lZH0TgEyQxaxmlkutF2UKwLs
HYEx/GJtwLjpExEjdpNG8ZD6wZ3+TO2oBVat1zZHq8AsJy58RK6I0P7Iwy4T7kDu
yO+8eLxLkJ2dFphW1WHULl+AR46GE7sG1kz3rZvGI6Rj5UDhCdCkXK6G4nmPwnDE
NNzyQOieb3Q9EWyrsy1g
=LJSZ
-----END PGP SIGNATURE-----




Information forwarded to bug-grep <at> gnu.org:
bug#15199; Package grep. (Tue, 27 Aug 2013 16:16:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Corinna Vinschen <vinschen <at> redhat.com>
To: Paolo Bonzini <bonzini <at> gnu.org>
Cc: bug-grep <at> gnu.org
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 27 Aug 2013 18:14:40 +0200
[Message part 1 (text/plain, inline)]
On Aug 27 17:53, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Il 20/08/2013 17:11, Corinna Vinschen ha scritto:
> > That's what I did when I started to write this patch, but then I 
> > decided against it for the following reason:
> > 
> > The implementation of mbrtowc, wcrtomb and towlower using UTF-16 
> > wchar_t works *only* in the Cygwin/Newlib-provided functions in 
> > exactly the way used in this patch.  I'm not aware that any other 
> > platform provides an equivalent implementation, even if wchar_t is 
> > 2 bytes.  Thus, the assumption that the code works in all cases in 
> > which sizeof (wchar_t) == 2, is wrong.  It would, for instance,
> > not work with the Windows implementation of wcrtomb, AFAIK.
> 
> Right, MSVCRT is exactly what I was thinking about.
> 
> > I'm not strongly opposed to changing this, but IMHO, to be on the 
> > safe side, this code should only be activated on a case by case 
> > basis, so only for Cygwin for now.  Same with a potential fix to 
> > the regex compiler, for which I have no idea how to do it, yet :(
> 
> Feel free to bug me on IRC if I can be of any help.

Thanks for the offer!  I'll get back to it probably in November and
I would be glad if you could help me through the gnulib regex code
then.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
[Message part 2 (application/pgp-signature, inline)]

Added tag(s) moreinfo. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sun, 27 Apr 2014 01:15:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#15199; Package grep. (Sun, 27 Apr 2014 01:18:02 GMT) Full text and rfc822 format available.

Message #13 received at 15199 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 15199 <at> debbugs.gnu.org
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Sat, 26 Apr 2014 18:17:25 -0700
I just now read this bug report <http://bugs.gnu.org/15199> and I'm 
afraid that I do not understand it.  Can someone explain?  The bug 
report appears to be the tail end of a long discussion, and I'm lacking 
context.  Thanks.




Information forwarded to bug-grep <at> gnu.org:
bug#15199; Package grep. (Mon, 28 Apr 2014 12:44:02 GMT) Full text and rfc822 format available.

Message #16 received at 15199 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 15199 <at> debbugs.gnu.org
Subject: Re: bug#15199: UTF-16 surrogate pair handling in grep -i option
Date: Mon, 28 Apr 2014 06:43:44 -0600
[Message part 1 (text/plain, inline)]
forcemerge 15192 15199
thanks

On 04/26/2014 07:17 PM, Paul Eggert wrote:
> I just now read this bug report <http://bugs.gnu.org/15199> and I'm
> afraid that I do not understand it.  Can someone explain?  The bug
> report appears to be the tail end of a long discussion, and I'm lacking
> context.  Thanks.

Threading-wise, you want to start at <http://bugs.gnu.org/15192>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Forcibly Merged 15192 15199. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 28 Apr 2014 14:06:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 359 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.