Skip to content

gh-105704: Disallow IPv6 URLs with invalid prefix/suffix #111261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions Lib/test/test_urlparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -1402,16 +1402,34 @@ def test_issue14072(self):
self.assertEqual(p2.path, '+31641044153')

def test_invalid_bracketed_hosts(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring, personally makes it hard to review. Could you please share the description, which existing test cases behavior are changing now (if any)? If you don't get to it, I will try to find out too. (But this can be considered for future refactors of test code).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - here are the differences in the tests:

  • urlsplit and urlparse are now tested (not just urlsplit)
  • strings and bytes are now tested (not just strings)
  • four new test cases are added (the first ten are the same as they were before)

If I need to split the refactoring into a separate PR, I can do that.

self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[192.0.2.146]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[important.com:8000]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[v123r.IP]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[v12ae]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[v.IP]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[v123.]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[v]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[0439:23af::2309::fae7:1234]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@[0439:23af:2309::fae7:1234:2342:438e:192.0.2.146]/Path?Query')
self.assertRaises(ValueError, urllib.parse.urlsplit, 'Scheme://user@]v6a.ip[/Path')
cases = [
'Scheme://user@[192.0.2.146]/Path?Query',
'Scheme://user@[important.com:8000]/Path?Query',
'Scheme://user@[v123r.IP]/Path?Query',
'Scheme://user@[v12ae]/Path?Query',
'Scheme://user@[v.IP]/Path?Query',
'Scheme://user@[v123.]/Path?Query',
'Scheme://user@[v]/Path?Query',
'Scheme://user@[0439:23af::2309::fae7:1234]/Path?Query',
'Scheme://user@[0439:23af:2309::fae7:1234:2342:438e:192.0.2.146]/Path?Query',
'Scheme://user@]v6a.ip[/Path',
'Scheme://user@[v6a.ip/path?query',
'Scheme://[email protected]]/path?query',
'Scheme://user@prefix.[v6a.ip]/path?query',
'Scheme://user@[v6a.ip].suffix/path?query',
]

for case in cases:
with self.subTest(case=case):
with self.assertRaises(ValueError):
urllib.parse.urlsplit(case)
with self.assertRaises(ValueError):
urllib.parse.urlparse(case)
bytes_case = case.encode('utf8')
with self.assertRaises(ValueError):
urllib.parse.urlsplit(bytes_case)
with self.assertRaises(ValueError):
urllib.parse.urlparse(bytes_case)

def test_splitting_bracketed_hosts(self):
p1 = urllib.parse.urlsplit('scheme://user@[v6a.ip]/path?query')
Expand Down
11 changes: 9 additions & 2 deletions Lib/urllib/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,8 +505,15 @@ def _urlsplit(url, scheme=None, allow_fragments=True):
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if '[' in netloc and ']' in netloc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept and updated to error appropriately as callers of urlsplit and urlparse expect the exception at parsing time. Not later on when accessing attributes.

bracketed_host = netloc.partition('[')[2].partition(']')[0]
_check_bracketed_host(bracketed_host)
_, _, hostinfo = netloc.rpartition('@')
bracket_prefix, have_open_br, bracketed = hostinfo.partition('[')
if bracket_prefix:
raise ValueError('Invalid IPv6 URL')
hostname, _, port = bracketed.partition(']')
_check_bracketed_host(hostname)
bracket_suffix, _, _ = port.partition(':')
if bracket_suffix:
raise ValueError('Invalid IPv6 URL')
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fix :func:`urllib.parse.urlparse` and :func:`urllib.parse.urlsplit` to
disallow IPv6 URLs with invalid prefix/suffix
Loading