GNU bug report logs - #40582
Valid URIs are rejected

Previous Next

Package: guile;

Reported by: Julien Lepiller <julien <at> lepiller.eu>

Date: Sun, 12 Apr 2020 19:45:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 40582 in the body.
You can then email your comments to 40582 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#40582; Package guile. (Sun, 12 Apr 2020 19:45:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julien Lepiller <julien <at> lepiller.eu>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 12 Apr 2020 19:45:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: bug-guile <at> gnu.org
Subject: Valid URIs are rejected
Date: Sun, 12 Apr 2020 15:44:31 -0400
Hi,

Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986, it should be a valid URI (see rule for reg-name in 3.2.2). However, passing it to string->uri results in #f. I've tracked this down to valid-host? which returns #f for "a".

The reason is that the regexp checking if the host is an ipv6 matches "a", which shouldn't happen because a is not an ipv6 address. Indeed, when I try (string->uri "uri://g/b"), I get the expected result.




Information forwarded to bug-guile <at> gnu.org:
bug#40582; Package guile. (Wed, 17 Jun 2020 21:58:02 GMT) Full text and rfc822 format available.

Message #8 received at 40582 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 40582 <at> debbugs.gnu.org
Subject: Re: bug#40582: Valid URIs are rejected
Date: Wed, 17 Jun 2020 23:57:33 +0200
[Message part 1 (text/plain, inline)]
Hi Julien,

Julien Lepiller <julien <at> lepiller.eu> skribis:

> Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986, it should be a valid URI (see rule for reg-name in 3.2.2). However, passing it to string->uri results in #f. I've tracked this down to valid-host? which returns #f for "a".
>
> The reason is that the regexp checking if the host is an ipv6 matches "a", which shouldn't happen because a is not an ipv6 address. Indeed, when I try (string->uri "uri://g/b"), I get the expected result.

Right.  ‘authority-regexp’ is fine, but ‘ipv6-regexp’, used by
‘valid-host?’, was too lax and would match “a” because it’s an hex digit
sequence.

The regexp below is still an approximation, but I think a better one.
Can you confirm?

Thanks,
Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/module/web/uri.scm b/module/web/uri.scm
index b4b89b9cc..d76432737 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -188,7 +188,7 @@ for ‘build-uri’ except there is no scheme."
 (define ipv4-regexp
   (make-regexp (string-append "^([" digits ".]+)$")))
 (define ipv6-regexp
-  (make-regexp (string-append "^([" hex-digits ":.]+)$")))
+  (make-regexp (string-append "^([" hex-digits "]*:[" hex-digits ":.]+)$")))
 (define domain-label-regexp
   (make-regexp
    (string-append "^[" letters digits "]"
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 94778acac..95fd82f16 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -1,6 +1,6 @@
 ;;;; web-uri.test --- URI library          -*- mode: scheme; coding: utf-8; -*-
 ;;;;
-;;;; 	Copyright (C) 2010-2012, 2014, 2017, 2019 Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2010-2012, 2014, 2017, 2019, 2020 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -179,6 +179,13 @@
            #:port 22
            #:path "/baz"))
 
+  (pass-if-equal "xyz://abc/x/y/z"         ;<https://bugs.gnu.org/40582>
+      (list 'xyz "abc" "/x/y/z")
+    (let ((uri (string->uri "xyz://abc/x/y/z")))
+      (list (uri-scheme uri)
+            (uri-host uri)
+            (uri-path uri))))
+
   (pass-if "http://bad.host.1"
     (not (string->uri "http://bad.host.1")))
 

Information forwarded to bug-guile <at> gnu.org:
bug#40582; Package guile. (Thu, 18 Jun 2020 01:21:02 GMT) Full text and rfc822 format available.

Message #11 received at 40582 <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 40582 <at> debbugs.gnu.org
Subject: Re: bug#40582: Valid URIs are rejected
Date: Wed, 17 Jun 2020 21:17:11 -0400
Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Courtès" <ludo <at> gnu.org> a écrit :
>Hi Julien,
>
>Julien Lepiller <julien <at> lepiller.eu> skribis:
>
>> Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986,
>it should be a valid URI (see rule for reg-name in 3.2.2). However,
>passing it to string->uri results in #f. I've tracked this down to
>valid-host? which returns #f for "a".
>>
>> The reason is that the regexp checking if the host is an ipv6 matches
>"a", which shouldn't happen because a is not an ipv6 address. Indeed,
>when I try (string->uri "uri://g/b"), I get the expected result.
>
>Right.  ‘authority-regexp’ is fine, but ‘ipv6-regexp’, used by
>‘valid-host?’, was too lax and would match “a” because it’s an hex
>digit
>sequence.
>
>The regexp below is still an approximation, but I think a better one.
>Can you confirm?
>
>Thanks,
>Ludo’.

Looks slightly better, thanks.

That's still incorrect, as it will match things that are not ipv6 addresses. Does it have to be a regexp though? Why not simply check (false-if-exception (inet-pton AF_INET6 host)), as in the return value of valid-host?

There's also a ipv6-host-pat that has an incorrect regexp, but I'm not sure what it is used for.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 18 Jun 2020 15:09:02 GMT) Full text and rfc822 format available.

Notification sent to Julien Lepiller <julien <at> lepiller.eu>:
bug acknowledged by developer. (Thu, 18 Jun 2020 15:09:02 GMT) Full text and rfc822 format available.

Message #16 received at 40582-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 40582-done <at> debbugs.gnu.org
Subject: Re: bug#40582: Valid URIs are rejected
Date: Thu, 18 Jun 2020 17:07:57 +0200
Hi,

Julien Lepiller <julien <at> lepiller.eu> skribis:

> Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Courtès" <ludo <at> gnu.org> a écrit :

[...]

>>The regexp below is still an approximation, but I think a better one.
>>Can you confirm?
>>
>>Thanks,
>>Ludo’.
>
> Looks slightly better, thanks.
>
> That's still incorrect, as it will match things that are not ipv6 addresses. Does it have to be a regexp though? Why not simply check (false-if-exception (inet-pton AF_INET6 host)), as in the return value of valid-host?

Using a regexp makes the code closer to the RFC since the RFC explicitly
describes the grammar.  It’s also the simple choice here.

> There's also a ipv6-host-pat that has an incorrect regexp, but I'm not sure what it is used for.

It’s use for ‘authority-regexp’, but that one is fine: it requires
square brackets around IPv6 addresses.

Pushed as 1ab2105339f60dba20c8c9680e49110501f3a6a0.

Thanks,
Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 17 Jul 2020 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 283 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.