GNU bug report logs -
#35785
'string->uri' fails in sv_SE locale
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35785 in the body.
You can then email your comments to 35785 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Fri, 17 May 2019 21:21:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Einar Largenius <einar.largenius <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Fri, 17 May 2019 21:21:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello.
I just downloaded guix and installed it. In my config I have this line:
(locale "sv_SE.utf8")
If I run 'guix pull' I get the error:
guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi-3.2.1.tar.gz"
The part in swedish means "file or directory does not exist".
'LANG= guix pull' works without issue.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Sat, 18 May 2019 11:56:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hello Einar,
Einar Largenius <einar.largenius <at> gmail.com> skribis:
> I just downloaded guix and installed it. In my config I have this line:
>
> (locale "sv_SE.utf8")
>
> If I run 'guix pull' I get the error:
>
> guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi-3.2.1.tar.gz"
>
> The part in swedish means "file or directory does not exist".
Could you paste the complete output of ‘guix pull -v2’ when running
under that locale?
Thanks,
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Sun, 19 May 2019 17:46:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 35785 <at> debbugs.gnu.org (full text, mbox):
> Could you paste the complete output of ‘guix pull -v2’ when running
> under that locale?
Yes sorry. I have not setup email yet on that system so I need to
manually transcribe any output. This should be the complete output:
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Building from this channel:
guix https://git.savannah.gnu.org/git/guix.git f5557bd
guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi-3.2.1.tar.gz"
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 20 May 2019 08:21:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Einar Largenius <einar.largenius <at> gmail.com> skribis:
>> Could you paste the complete output of ‘guix pull -v2’ when running
>> under that locale?
>
> Yes sorry. I have not setup email yet on that system so I need to
> manually transcribe any output. This should be the complete output:
>
> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
> Building from this channel:
> guix https://git.savannah.gnu.org/git/guix.git f5557bd
> guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi-3.2.1.tar.gz"
I can reproduce it:
--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 guix pull -p foo
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Building from this channel:
guix https://git.savannah.gnu.org/git/guix.git 0f469c1
guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz"
--8<---------------cut here---------------end--------------->8---
Super weird!
Investigating…
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 20 May 2019 09:15:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hi!
So the guts of the problem is that Guile’s ‘string->uri’ procedure
behaves incorrectly under that locale:
--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 ./pre-inst-env guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guile-user)> ,use(web uri)
scheme@(guile-user)> (string->uri "ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz")
$1 = #f
--8<---------------cut here---------------end--------------->8---
More specifically, ‘parse-authority’ is failing under that locale,
because of the “w”:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (const 'fail))
$5 = fail
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (const 'fail))
$6 = #f
$7 = "sourcevare.org"
$8 = #f
--8<---------------cut here---------------end--------------->8---
We can boil it down to this example:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (string-match "[a-z]" "a")
$10 = #("a" (0 . 1))
scheme@(guile-user)> (string-match "[a-z]" "w")
$11 = #f
--8<---------------cut here---------------end--------------->8---
In short, under the sv_SE.utf8 locale of glibc 2.28, “w” is not
considered part of the ‘a-z’ interval.
Indeed, ‘localedata/locales/sv_SE’ in glibc reads this:
% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'. Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.
Using the “lower” regexp class instead of “[a-z]” works:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-match "[[:lower:]]" "w")
$12 = #("w" (0 . 1))
--8<---------------cut here---------------end--------------->8---
However, it’s not clear to me whether the “lower” class is supposed to
be the same for all locales or if we’re just lucky:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
Thoughts?
The workaround until we’ve fixed it is to use another locale, though you
can still set “LC_MESSAGES=sv_SE.utf8” or “LANGUAGE=sv”.
Ludo’.
Changed bug title to ''string->uri' fails in sv_SE locale' from 'guix won't download if locale is set to swedish'
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Mon, 20 May 2019 09:15:02 GMT)
Full text and
rfc822 format available.
Severity set to 'important' from 'normal'
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Mon, 20 May 2019 09:17:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 27 May 2019 11:07:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludo <at> gnu.org> writes:
> Using the “lower” regexp class instead of “[a-z]” works:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (string-match "[[:lower:]]" "w")
> $12 = #("w" (0 . 1))
> --8<---------------cut here---------------end--------------->8---
>
> However, it’s not clear to me whether the “lower” class is supposed to
> be the same for all locales or if we’re just lucky:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
>
> Thoughts?
The lower class is much larger than [a-z]. If we only wanted to work
around this particular problem we could explicitly spell out the range,
which would be the same in all locales. (Obviously, that wouldn’t be
pretty.)
But can’t URI parts contain more than those characters? To circumvent
the question whether the lower class is locale dependent we could
generate an explicit range from a charset.
--
Ricardo
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 27 May 2019 13:40:02 GMT)
Full text and
rfc822 format available.
Message #27 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hello,
Ricardo Wurmus <rekado <at> elephly.net> writes:
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Using the “lower” regexp class instead of “[a-z]” works:
>>
>> --8<---------------cut here---------------start------------->8---
>> scheme@(guile-user)> (string-match "[[:lower:]]" "w")
>> $12 = #("w" (0 . 1))
>> --8<---------------cut here---------------end--------------->8---
>>
>> However, it’s not clear to me whether the “lower” class is supposed to
>> be the same for all locales or if we’re just lucky:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
>>
>> Thoughts?
>
> The lower class is much larger than [a-z]. If we only wanted to work
> around this particular problem we could explicitly spell out the range,
> which would be the same in all locales. (Obviously, that wouldn’t be
> pretty.)
I think that explicitly spelling out the range is the right thing to do
here. The POSIX spec says that character ranges work in the POSIX
locale, but “in other locales, a range expression has unspecified
behavior.”
> But can’t URI parts contain more than those characters?
A quick reading of RFC 3986 suggests that the host part of a URI can be
an IP address (version 4 or 6) or a registered name. It gives the
following rules for registered names:
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Here, “ALPHA”, “DIGIT”, and “HEXDIG” are specified in RFC 2234, and are
just the ASCII ranges you might expect (except for that “HEXDIG” only
allows uppercase letters).
It looks like Guile is currently a little stricter than this, but pretty
close (if you take the character ranges to mean ASCII ranges).
> To circumvent
> the question whether the lower class is locale dependent we could
> generate an explicit range from a charset.
I think this is the right approach. Using “[:lower:]” would allow
things outside of the RFC, like ‘é’. Adding support for
internationalized domain names using Punycode would be cool, but well
outside the scope of this bug. :)
-- Tim
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Tue, 28 May 2019 11:18:01 GMT)
Full text and
rfc822 format available.
Message #30 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hi Timothy,
Timothy Sample <samplet <at> ngyro.com> skribis:
> A quick reading of RFC 3986 suggests that the host part of a URI can be
> an IP address (version 4 or 6) or a registered name. It gives the
> following rules for registered names:
>
> reg-name = *( unreserved / pct-encoded / sub-delims )
> unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
> pct-encoded = "%" HEXDIG HEXDIG
> sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
> / "*" / "+" / "," / ";" / "="
>
> Here, “ALPHA”, “DIGIT”, and “HEXDIG” are specified in RFC 2234, and are
> just the ASCII ranges you might expect (except for that “HEXDIG” only
> allows uppercase letters).
Do you think you could turn that into a patch for Guile? I’d happily
apply it. :-)
It looks like both [[:alnum:]] & co. and ranges would be
locale-dependent, so my understanding is that we’ll have to list all the
characters explicitly, right?
Thanks,
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 03 Jun 2019 00:40:01 GMT)
Full text and
rfc822 format available.
Message #33 received at 35785 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi Timothy,
>
> Timothy Sample <samplet <at> ngyro.com> skribis:
>
>> A quick reading of RFC 3986 suggests that the host part of a URI can be
>> an IP address (version 4 or 6) or a registered name. It gives the
>> following rules for registered names:
>>
>> reg-name = *( unreserved / pct-encoded / sub-delims )
>> unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
>> pct-encoded = "%" HEXDIG HEXDIG
>> sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
>> / "*" / "+" / "," / ";" / "="
>>
>> Here, “ALPHA”, “DIGIT”, and “HEXDIG” are specified in RFC 2234, and are
>> just the ASCII ranges you might expect (except for that “HEXDIG” only
>> allows uppercase letters).
>
> Do you think you could turn that into a patch for Guile? I’d happily
> apply it. :-)
>
> It looks like both [[:alnum:]] & co. and ranges would be
> locale-dependent, so my understanding is that we’ll have to list all the
> characters explicitly, right?
Here’s a patch for Guile that uses explicit lists of characters in the
‘(web uri)’ module instead of character ranges. It includes two tests
that are pretty verbose, but seem to do the trick.
I have a bit more background on the problem, mostly coming from a Glibc
bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
It turns out that it is well-known upstream, and avoiding character
ranges is the recommended approach for know. Some other GNU tools have
adopted what is being called the “Rational Range Interpretation”
<https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
AIUI, this means they use the underlying encoding numbers for ranges (I
checked the source, but I’m only mostly sure I read it right). It looks
like the Glibc folks are unsure how to proceed on this (but are maybe
slightly leaning towards the “rational” approach).
It’s all a pretty big mess, really. I was hoping there would be some
obvious thing that would fix the problem more generally. Short of
pulling in the Gnulib regex code or writing something in Scheme, it
looks like Guile is stuck where it is now.
I’m unsure if the changes are considered “trivial” from a copyright
perspective. It’s pretty close, but I think programmers tend to
underestimate here. I’ve started the FSF copyright assignment process
either way, since is likely not my last Guile patch. :)
-- Tim
[0001-Make-URI-handling-locale-independent.patch (text/x-patch, attachment)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 03 Jun 2019 13:03:02 GMT)
Full text and
rfc822 format available.
Message #36 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hi Timothy,
Timothy Sample <samplet <at> ngyro.com> skribis:
> Here’s a patch for Guile that uses explicit lists of characters in the
> ‘(web uri)’ module instead of character ranges. It includes two tests
> that are pretty verbose, but seem to do the trick.
>
> I have a bit more background on the problem, mostly coming from a Glibc
> bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
>
> It turns out that it is well-known upstream, and avoiding character
> ranges is the recommended approach for know. Some other GNU tools have
> adopted what is being called the “Rational Range Interpretation”
> <https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
> AIUI, this means they use the underlying encoding numbers for ranges (I
> checked the source, but I’m only mostly sure I read it right). It looks
> like the Glibc folks are unsure how to proceed on this (but are maybe
> slightly leaning towards the “rational” approach).
Great that you gleaned good references on this topic!
> It’s all a pretty big mess, really. I was hoping there would be some
> obvious thing that would fix the problem more generally. Short of
> pulling in the Gnulib regex code or writing something in Scheme, it
> looks like Guile is stuck where it is now.
Yeah. The alternative would be to not use regexps in this context, I
guess.
> I’m unsure if the changes are considered “trivial” from a copyright
> perspective. It’s pretty close, but I think programmers tend to
> underestimate here. I’ve started the FSF copyright assignment process
> either way, since is likely not my last Guile patch. :)
If the process is already underway, I think it’s fine to commit this
patch (I would rather wait if it were longer and/or if we didn’t know
each other already).
> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
> From: Timothy Sample <samplet <at> ngyro.com>
> Date: Sun, 2 Jun 2019 14:41:20 -0400
> Subject: [PATCH] Make URI handling locale independent.
>
> Fixes <https://bugs.gnu.org/35785>.
>
> * module/web/uri.scm (digits, hex-digits, letters): New variables.
> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
> list each character instead of using character ranges.
> * test-suite/tests/web-uri.test: Add corresponding tests.
[...]
> + (pass-if "http://www.example.com (sv_SE)"
> + (dynamic-wind
> + (lambda () #t)
> + (lambda ()
> + (with-locale "sv_SE.utf8"
> + (reload-module (resolve-module '(web uri)))
> + (uri=? (string->uri "http://www.example.com")
> + #:scheme 'http #:host "www.example.com" #:path "")))
Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
the other test)?
For the sv_SE test, what about taking a host name with a ‘w’, since
that’s the use case that allowed us to uncover this bug?
Apart from that it LGTM, thank you!
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Mon, 03 Jun 2019 14:25:01 GMT)
Full text and
rfc822 format available.
Message #39 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hi Ludo,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi Timothy,
>
> Timothy Sample <samplet <at> ngyro.com> skribis:
>
>> Here’s a patch for Guile that uses explicit lists of characters in the
>> ‘(web uri)’ module instead of character ranges. It includes two tests
>> that are pretty verbose, but seem to do the trick.
>>
>> I have a bit more background on the problem, mostly coming from a Glibc
>> bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
>>
>> It turns out that it is well-known upstream, and avoiding character
>> ranges is the recommended approach for know. Some other GNU tools have
>> adopted what is being called the “Rational Range Interpretation”
>> <https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
>> AIUI, this means they use the underlying encoding numbers for ranges (I
>> checked the source, but I’m only mostly sure I read it right). It looks
>> like the Glibc folks are unsure how to proceed on this (but are maybe
>> slightly leaning towards the “rational” approach).
>
> Great that you gleaned good references on this topic!
>
>> It’s all a pretty big mess, really. I was hoping there would be some
>> obvious thing that would fix the problem more generally. Short of
>> pulling in the Gnulib regex code or writing something in Scheme, it
>> looks like Guile is stuck where it is now.
>
> Yeah. The alternative would be to not use regexps in this context, I
> guess.
I meant fixing regexes in other contexts, since I’m sure the URI module
is not the only Guile code ever that assumed “[a-z]” would only match
ASCII lowercase letters.
>> I’m unsure if the changes are considered “trivial” from a copyright
>> perspective. It’s pretty close, but I think programmers tend to
>> underestimate here. I’ve started the FSF copyright assignment process
>> either way, since is likely not my last Guile patch. :)
>
> If the process is already underway, I think it’s fine to commit this
> patch (I would rather wait if it were longer and/or if we didn’t know
> each other already).
Sounds good!
>> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
>> From: Timothy Sample <samplet <at> ngyro.com>
>> Date: Sun, 2 Jun 2019 14:41:20 -0400
>> Subject: [PATCH] Make URI handling locale independent.
>>
>> Fixes <https://bugs.gnu.org/35785>.
>>
>> * module/web/uri.scm (digits, hex-digits, letters): New variables.
>> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
>> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
>> list each character instead of using character ranges.
>> * test-suite/tests/web-uri.test: Add corresponding tests.
>
> [...]
>
>> + (pass-if "http://www.example.com (sv_SE)"
>> + (dynamic-wind
>> + (lambda () #t)
>> + (lambda ()
>> + (with-locale "sv_SE.utf8"
>> + (reload-module (resolve-module '(web uri)))
>> + (uri=? (string->uri "http://www.example.com")
>> + #:scheme 'http #:host "www.example.com" #:path "")))
>
> Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
> the other test)?
I needed to reload the modules like that to make the tests fail without
the patch and pass with it. My understanding is that the bug happens
at regex compile time, which happens when the module is loaded. If I
don’t reload the module, the old URI code passes the tests, since the
regexes were compiled with a locale that does not trigger the bug. It’s
a little wacky, sure, but it was the best idea I could come up with.
> For the sv_SE test, what about taking a host name with a ‘w’, since
> that’s the use case that allowed us to uncover this bug?
I thought I was being clever by using a “www” hostname, but apparently
it’s so normalized as to be invisible! Feel free to change it to
something more obvious like “w.com” or whatever.
-- Tim
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Tue, 04 Jun 2019 07:44:01 GMT)
Full text and
rfc822 format available.
Message #42 received at 35785 <at> debbugs.gnu.org (full text, mbox):
Hello,
Timothy Sample <samplet <at> ngyro.com> skribis:
>>> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
>>> From: Timothy Sample <samplet <at> ngyro.com>
>>> Date: Sun, 2 Jun 2019 14:41:20 -0400
>>> Subject: [PATCH] Make URI handling locale independent.
>>>
>>> Fixes <https://bugs.gnu.org/35785>.
>>>
>>> * module/web/uri.scm (digits, hex-digits, letters): New variables.
>>> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
>>> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
>>> list each character instead of using character ranges.
>>> * test-suite/tests/web-uri.test: Add corresponding tests.
>>
>> [...]
>>
>>> + (pass-if "http://www.example.com (sv_SE)"
>>> + (dynamic-wind
>>> + (lambda () #t)
>>> + (lambda ()
>>> + (with-locale "sv_SE.utf8"
>>> + (reload-module (resolve-module '(web uri)))
>>> + (uri=? (string->uri "http://www.example.com")
>>> + #:scheme 'http #:host "www.example.com" #:path "")))
>>
>> Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
>> the other test)?
>
> I needed to reload the modules like that to make the tests fail without
> the patch and pass with it. My understanding is that the bug happens
> at regex compile time, which happens when the module is loaded. If I
> don’t reload the module, the old URI code passes the tests, since the
> regexes were compiled with a locale that does not trigger the bug. It’s
> a little wacky, sure, but it was the best idea I could come up with.
Oooh, I see. Could you add a comment to explain this? Then we’re done.
>> For the sv_SE test, what about taking a host name with a ‘w’, since
>> that’s the use case that allowed us to uncover this bug?
>
> I thought I was being clever by using a “www” hostname, but apparently
> it’s so normalized as to be invisible! Feel free to change it to
> something more obvious like “w.com” or whatever.
Silly me, I guess I need new glasses. :-)
Thanks!
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#35785
; Package
guix
.
(Tue, 04 Jun 2019 13:57:02 GMT)
Full text and
rfc822 format available.
Message #45 received at 35785 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Timothy Sample <samplet <at> ngyro.com> skribis:
>
> [...]
>
>> I needed to reload the modules like that to make the tests fail without
>> the patch and pass with it. My understanding is that the bug happens
>> at regex compile time, which happens when the module is loaded. If I
>> don’t reload the module, the old URI code passes the tests, since the
>> regexes were compiled with a locale that does not trigger the bug. It’s
>> a little wacky, sure, but it was the best idea I could come up with.
>
> Oooh, I see. Could you add a comment to explain this? Then we’re done.
Here it is! I hope it is clear.
-- Tim
[0001-Make-URI-handling-locale-independent.patch (text/x-patch, attachment)]
bug reassigned from package 'guix' to 'guile'.
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Tue, 04 Jun 2019 19:24:01 GMT)
Full text and
rfc822 format available.
Reply sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
You have taken responsibility.
(Tue, 04 Jun 2019 19:27:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Einar Largenius <einar.largenius <at> gmail.com>
:
bug acknowledged by developer.
(Tue, 04 Jun 2019 19:27:02 GMT)
Full text and
rfc822 format available.
Message #52 received at 35785-done <at> debbugs.gnu.org (full text, mbox):
Hi!
Timothy Sample <samplet <at> ngyro.com> skribis:
> From 9ac8643e5315d4baaddb93ee246ba8db0b3448ab Mon Sep 17 00:00:00 2001
> From: Timothy Sample <samplet <at> ngyro.com>
> Date: Sun, 2 Jun 2019 14:41:20 -0400
> Subject: [PATCH] Make URI handling locale independent.
>
> Fixes <https://bugs.gnu.org/35785>.
>
> * module/web/uri.scm (digits, hex-digits, letters): New variables.
> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
> list each character instead of using character ranges.
> * test-suite/tests/web-uri.test: Add corresponding tests.
Perfect; pushed to the ‘stable-2.2’ branch as
420c2632bb1f48e492a035c1d216f209734f45e6.
We got a notification from the FSF that they received your copyright
assignment request too, so everything is on track.
Thank you!
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 03 Jul 2019 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 270 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.