GNU bug report logs -
#26779
expr length
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 26779 in the body.
You can then email your comments to 26779 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Thu, 04 May 2017 15:24:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Андрей Воронов <voronov <at> factor-ts.ru>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 04 May 2017 15:24:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
I have the bug in expr utility when it perform operation of the
calculating length of the string in my multi-byte encoding ru_RU.UTF-8.
When I run: "expr length Привет"
it return me the number 12.
But in the russian word "Hello" the length is 2 times shorter: exactly 6
characters.
May you register and fix this problem?
expr --version
expr (GNU coreutils) 8.25
God bless you
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Thu, 04 May 2017 15:44:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 26779 <at> debbugs.gnu.org (full text, mbox):
On 04/05/17 02:59, Андрей Воронов wrote:
> Hello,
>
> I have the bug in expr utility when it perform operation of the
> calculating length of the string in my multi-byte encoding ru_RU.UTF-8.
> When I run: "expr length Привет"
> it return me the number 12.
> But in the russian word "Hello" the length is 2 times shorter: exactly 6
> characters.
> May you register and fix this problem?
expr is listed in the plan here:
http://www.pixelbeat.org/docs/coreutils_i18n/
thanks
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Tue, 09 May 2017 07:06:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 26779 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
> On May 4, 2017, at 11:43, Pádraig Brady <P <at> draigBrady.com> wrote:
>
> On 04/05/17 02:59, Андрей Воронов wrote:
>> I have the bug in expr utility when it perform operation of the
>> calculating length of the string in my multi-byte encoding ru_RU.UTF-8.
>
> expr is listed in the plan here:
> http://www.pixelbeat.org/docs/coreutils_i18n/
Attached a draft patch implementing multibyte support for 'expr'
(it doesn't need any code from my previous multibyte stuff, so I'm sending it separately).
Specifically, the length/index/substr operators are adjusted.
The regex engine for the 'match' operator already supported multibyte characters (only minor adjustment needed to return matched character count instead of matched byte count).
The string comparison already used 'strcoll' so I assumed they work with multibyte strings.
Comments welcomed,
- assaf
[0001-expr-add-multibyte-support.patch.gz (application/x-gzip, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Tue, 09 May 2017 07:27:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 26779 <at> debbugs.gnu.org (full text, mbox):
> On May 9, 2017, at 03:04, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>
> Attached a draft patch implementing multibyte support for 'expr'
> (it doesn't need any code from my previous multibyte stuff, so I'm sending it separately).
> [...]
>
> <0001-expr-add-multibyte-support.patch.gz>
Re-reading the patch, I see there's a memory leak when 'mbs_logical_substr' is called.
I'll send updated version soon (if you have more comments, please send).
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Tue, 09 May 2017 12:32:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 26779 <at> debbugs.gnu.org (full text, mbox):
On 09/05/17 00:04, Assaf Gordon wrote:
> Hello,
>
>> On May 4, 2017, at 11:43, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 04/05/17 02:59, Андрей Воронов wrote:
>>> I have the bug in expr utility when it perform operation of the
>>> calculating length of the string in my multi-byte encoding ru_RU.UTF-8.
>>
>> expr is listed in the plan here:
>> http://www.pixelbeat.org/docs/coreutils_i18n/
>
> Attached a draft patch implementing multibyte support for 'expr'
> (it doesn't need any code from my previous multibyte stuff, so I'm sending it separately).
>
> Specifically, the length/index/substr operators are adjusted.
> The regex engine for the 'match' operator already supported multibyte characters (only minor adjustment needed to return matched character count instead of matched byte count).
> The string comparison already used 'strcoll' so I assumed they work with multibyte strings.
Definitely needs a NEWS entry
and mention of this bug in the commit message.
Perf isn't a huge concern for expr use cases,
so I'd rather not address in this patch if at all,
but for future reference it might be nice to pass in
to mbschr() etc. whether the current locale is UTF8.
Maybe some global similar to MB_CUR_MAX. Then mbschr()
could be optimized in the UTF8 case as per pseudo code at:
http://www.pixelbeat.org/docs/utf8_programming.html
s/sequnce/sequence/
I notice expr isn't handled in the rhat/suse i18n patch,
so there is nothing to consider there.
Excellent work on the tests.
I've updated status in http://www.pixelbeat.org/docs/coreutils_i18n/
as this is ready to land I think.
thanks!
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Thu, 11 May 2017 01:25:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 26779 <at> debbugs.gnu.org (full text, mbox):
A couple of things. First, since performance doesn't matter here, wouldn't it be
simpler to convert to wide character form by using mbstowcs, do the work in
wide-character form, and then convert back via wcstombs? The resulting code
should be easier to maintain, I'd think.
Also, please stick to GNU style for operator spacing and parenthesization.
Thanks.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Fri, 19 May 2017 04:11:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 26779 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
Attached an updated version of the 'expr/multibyte' patch.
> On May 9, 2017, at 08:31, Pádraig Brady <P <at> draigBrady.com> wrote:
> Definitely needs a NEWS entry
> and mention of this bug in the commit message.
I added two NEWS entries: one for the bug fix and one for the new multibyte support.
I could not find an exact version where this bug was 'introduced'.
'expr' never had a notion of multibyte strings, it was gnulib's "re_match"
which gained multibyte support.
The earliest gnulib mention of re_match and multibyte I could find is this:
commit 42f97e3f0c576afb94226c8493f3567b44352ca8
Author: Richard Stallman <rms <at> gnu.org>
Date: Fri Apr 3 07:33:13 1998 +0000
> On May 10, 2017, at 21:24, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> A couple of things. First, since performance doesn't matter here, wouldn't it be simpler to convert to wide character form by using mbstowcs, do the work in wide-character form, and then convert back via wcstombs? The resulting code should be easier to maintain, I'd think.
The downside of 'mbstowcs' is that it fails and does not continue
when it encounters invalid multibyte sequences, whereas keeping
the string as 'char*' and using mbschr (with the additional code)
does handle invalid sequences fine (and the tests include such cases).
What do you think ?
I can work on a different patch with wide-char strings if this is deemed worthwhile.
> Also, please stick to GNU style for operator spacing and parenthesization.
I fixed the one case I spotted - if I missed any, please let me know
("make syntax-check" passes, but there could be others).
regards,
- assaf
[0001-expr-add-multibyte-support.patch.gz (application/x-gzip, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Fri, 19 May 2017 05:48:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 26779 <at> debbugs.gnu.org (full text, mbox):
Assaf Gordon wrote:
> The downside of 'mbstowcs' is that it fails and does not continue
> when it encounters invalid multibyte sequences,
It's OK for expr to report an error and exit in that case, which should be simple.
It's not a big deal.
> I fixed the one case I spotted - if I missed any, please let me know
> ("make syntax-check" passes, but there could be others).
char* -> char *
MB_CUR_MAX==1 -> MB_CUR_MAX == 1
mbs_logical_cspn ('hello','a') -> mbs_logical_cspn ('hello', 'a') [in a comment]
const char *_string -> const char *string
* mbui_cur_ptr (iter) -> *mbui_cur_ptr (iter)
(s+1) -> s + 1
Put */ at end of last line of comment, not at start of next.
MB_CUR_MAX>1 -> MB_CUR_MAX > 1
xmalloc ( blen + 1 ) -> xmalloc (blen + 1)
idx<pos -> idx < pos
mempcpy ( vlim -> mempcpy (vlim
d>=ofs -> d >= ofs
size_t c=0; -> size_t c = 0;
No need to use "const" on locals, unless perhaps you take their address.
Probably more instances; got tired of looking....
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Wed, 28 Jun 2017 01:42:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 26779 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
Attached work-in-progress for expr/multibyte,
hopefully fixing all (most?) of the coding style issues
(Internal conversion to wide strings not yet implemented).
Comments welcomed,
- assaf
[0001-expr-add-multibyte-support.patch.gz (application/gzip, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#26779
; Package
coreutils
.
(Wed, 28 Jun 2017 16:34:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 26779 <at> debbugs.gnu.org (full text, mbox):
On 27/06/17 18:41, Assaf Gordon wrote:
> Hi,
>
> Attached work-in-progress for expr/multibyte,
> hopefully fixing all (most?) of the coding style issues
> (Internal conversion to wide strings not yet implemented).
>
> Comments welcomed,
> - assaf
>
Excellent work.
This looks to have sufficient testing and is ready to push.
We can look at using wide strings internally in a further patch.
thanks!
Pádraig.
Reply sent
to
Assaf Gordon <assafgordon <at> gmail.com>
:
You have taken responsibility.
(Thu, 29 Jun 2017 01:21:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Андрей Воронов <voronov <at> factor-ts.ru>
:
bug acknowledged by developer.
(Thu, 29 Jun 2017 01:21:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 26779-done <at> debbugs.gnu.org (full text, mbox):
> On Jun 28, 2017, at 12:32, Pádraig Brady <P <at> draigBrady.com> wrote:
>
> On 27/06/17 18:41, Assaf Gordon wrote:
>> Attached work-in-progress for expr/multibyte,
> This looks to have sufficient testing and is ready to push.
> We can look at using wide strings internally in a further patch.
Thanks,
pushed here:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9f2be5bfec2bfe86c08
I'm thus closing this bug report (but of course
discussion can continue by replying to this thread).
regards,
- assaf
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 27 Jul 2017 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 272 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.