GNU bug report logs -
#40582
Valid URIs are rejected
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 40582 in the body.
You can then email your comments to 40582 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#40582
; Package
guile
.
(Sun, 12 Apr 2020 19:45:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Julien Lepiller <julien <at> lepiller.eu>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sun, 12 Apr 2020 19:45:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986, it should be a valid URI (see rule for reg-name in 3.2.2). However, passing it to string->uri results in #f. I've tracked this down to valid-host? which returns #f for "a".
The reason is that the regexp checking if the host is an ipv6 matches "a", which shouldn't happen because a is not an ipv6 address. Indeed, when I try (string->uri "uri://g/b"), I get the expected result.
Information forwarded
to
bug-guile <at> gnu.org
:
bug#40582
; Package
guile
.
(Wed, 17 Jun 2020 21:58:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 40582 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Julien,
Julien Lepiller <julien <at> lepiller.eu> skribis:
> Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986, it should be a valid URI (see rule for reg-name in 3.2.2). However, passing it to string->uri results in #f. I've tracked this down to valid-host? which returns #f for "a".
>
> The reason is that the regexp checking if the host is an ipv6 matches "a", which shouldn't happen because a is not an ipv6 address. Indeed, when I try (string->uri "uri://g/b"), I get the expected result.
Right. ‘authority-regexp’ is fine, but ‘ipv6-regexp’, used by
‘valid-host?’, was too lax and would match “a” because it’s an hex digit
sequence.
The regexp below is still an approximation, but I think a better one.
Can you confirm?
Thanks,
Ludo’.
[Message part 2 (text/x-patch, inline)]
diff --git a/module/web/uri.scm b/module/web/uri.scm
index b4b89b9cc..d76432737 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -188,7 +188,7 @@ for ‘build-uri’ except there is no scheme."
(define ipv4-regexp
(make-regexp (string-append "^([" digits ".]+)$")))
(define ipv6-regexp
- (make-regexp (string-append "^([" hex-digits ":.]+)$")))
+ (make-regexp (string-append "^([" hex-digits "]*:[" hex-digits ":.]+)$")))
(define domain-label-regexp
(make-regexp
(string-append "^[" letters digits "]"
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 94778acac..95fd82f16 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -1,6 +1,6 @@
;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8; -*-
;;;;
-;;;; Copyright (C) 2010-2012, 2014, 2017, 2019 Free Software Foundation, Inc.
+;;;; Copyright (C) 2010-2012, 2014, 2017, 2019, 2020 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -179,6 +179,13 @@
#:port 22
#:path "/baz"))
+ (pass-if-equal "xyz://abc/x/y/z" ;<https://bugs.gnu.org/40582>
+ (list 'xyz "abc" "/x/y/z")
+ (let ((uri (string->uri "xyz://abc/x/y/z")))
+ (list (uri-scheme uri)
+ (uri-host uri)
+ (uri-path uri))))
+
(pass-if "http://bad.host.1"
(not (string->uri "http://bad.host.1")))
Information forwarded
to
bug-guile <at> gnu.org
:
bug#40582
; Package
guile
.
(Thu, 18 Jun 2020 01:21:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 40582 <at> debbugs.gnu.org (full text, mbox):
Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Courtès" <ludo <at> gnu.org> a écrit :
>Hi Julien,
>
>Julien Lepiller <julien <at> lepiller.eu> skribis:
>
>> Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986,
>it should be a valid URI (see rule for reg-name in 3.2.2). However,
>passing it to string->uri results in #f. I've tracked this down to
>valid-host? which returns #f for "a".
>>
>> The reason is that the regexp checking if the host is an ipv6 matches
>"a", which shouldn't happen because a is not an ipv6 address. Indeed,
>when I try (string->uri "uri://g/b"), I get the expected result.
>
>Right. ‘authority-regexp’ is fine, but ‘ipv6-regexp’, used by
>‘valid-host?’, was too lax and would match “a” because it’s an hex
>digit
>sequence.
>
>The regexp below is still an approximation, but I think a better one.
>Can you confirm?
>
>Thanks,
>Ludo’.
Looks slightly better, thanks.
That's still incorrect, as it will match things that are not ipv6 addresses. Does it have to be a regexp though? Why not simply check (false-if-exception (inet-pton AF_INET6 host)), as in the return value of valid-host?
There's also a ipv6-host-pat that has an incorrect regexp, but I'm not sure what it is used for.
Reply sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
You have taken responsibility.
(Thu, 18 Jun 2020 15:09:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Julien Lepiller <julien <at> lepiller.eu>
:
bug acknowledged by developer.
(Thu, 18 Jun 2020 15:09:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 40582-done <at> debbugs.gnu.org (full text, mbox):
Hi,
Julien Lepiller <julien <at> lepiller.eu> skribis:
> Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Courtès" <ludo <at> gnu.org> a écrit :
[...]
>>The regexp below is still an approximation, but I think a better one.
>>Can you confirm?
>>
>>Thanks,
>>Ludo’.
>
> Looks slightly better, thanks.
>
> That's still incorrect, as it will match things that are not ipv6 addresses. Does it have to be a regexp though? Why not simply check (false-if-exception (inet-pton AF_INET6 host)), as in the return value of valid-host?
Using a regexp makes the code closer to the RFC since the RFC explicitly
describes the grammar. It’s also the simple choice here.
> There's also a ipv6-host-pat that has an incorrect regexp, but I'm not sure what it is used for.
It’s use for ‘authority-regexp’, but that one is fine: it requires
square brackets around IPv6 addresses.
Pushed as 1ab2105339f60dba20c8c9680e49110501f3a6a0.
Thanks,
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 17 Jul 2020 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 283 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.