Skip to content

Remove references to urllib.request and replace with requests library refs #10759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

johnnymck
Copy link
Contributor

  • updated testing framework head of new behaviour
  • replaced urllib with requests; behaviour remains same
  • Replaced urllib references with reuqests; behaviour is the same
  • replaced refs to urllib with requests library - worried about lack of test coverage
  • Replaced urllib refs in solr_updater, requests has builtin json parser
  • Replaced references to urllib with requests, removed cookielib as functionality handled by Session object
  • removed uncaught commented-out reference :S Sorry!
  • Updated missed docstring changes in test_obfi.py
  • Replaced refs to urllib with requests lib, removed cookielib as functionality supported inside request Session() object

Closes #2852

This PR refactors references to urllib.request (as per this discussion point) and replaces them with up-to-date functionality from the requests library.

Also worth noting that this is my first contribution to this project, and my second-ever FOSS code contribution. Please be kind! If I've done anything silly or unhandy, please don't hesitate to mention, just note it was done out of ignorance rather than laziness or lack of skill.

Technical

PR Also removes some references to no-longer needed lib import cookielib, as this functionality is also handled by Session object in requests.

Testing

Some functionality updated is not covered by tests, for example, adapter.py. Any advice on testing this/adding coverage is appreciated!

Screenshot

Stakeholders

@RayBB @mekarpeles

@github-project-automation github-project-automation bot moved this to Waiting Review/Merge from Staff in Ray's Project May 12, 2025
Copy link
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @johnnymck welcome and thanks for making your contribution!

The tricky part of this issue is testing. Such as, making the adapter.py file.
After making these changes did you try starting your local instance, logging in, and using a few basic functions of the site? That's a good first check to do.

I left you to short questions on the code itself.

Finally, before we can review this please make sure to fix the mypy errors visible here. You should be able to reproduce these locally when you setup pre-commit as is mentioned in our readme. That way you don't have to push up and wait for the run to see if it's fixed.

Good luck with the changes and thanks for opening the PR!


def __enter__(self):
return self

def __exit__(self, *args):
pass

return MockRead()
return MockGet()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this now return MockGet() instead of MockRead()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for this is likely MockResponse()

@cdrini cdrini requested a review from Copilot May 14, 2025 16:40
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the code to replace outdated urllib references with the requests library, streamlining HTTP calls and simplifying cookie management. Key changes include:

  • Removing urllib and cookielib dependencies and adding requests across multiple modules.
  • Updating HTTP request and response handling in both production and test code.
  • Adjusting seed fetching in obfi modules to work with requests.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/tests/test_obfi.py Changed mocks and monkeypatches to use requests.get instead of urllib.request.urlopen.
scripts/solr_updater.py Updated URL fetching and JSON parsing to use requests.get and response.json().
scripts/obfi/mktable.py Replaced urllib.request.urlopen with requests.get and adjusted exception handling.
scripts/obfi/hide.py Updated seed retrieval to use requests.get instead of urllib.request.urlopen.
openlibrary/plugins/upstream/adapter.py Converted HTTP request handling from urllib to requests; updated parameter encoding with requests utilities.
openlibrary/plugins/openlibrary/tests/test_ratingsapi.py Modified tests to use a requests.Session and adjusted login data encoding.
openlibrary/plugins/openlibrary/tests/test_listapi.py Replaced urllib and cookielib with requests and updated POST data handling.

@@ -263,7 +263,7 @@ def before_request(self):
q = json.loads(i['query'])
q = convert_dict(q)
i['query'] = json.dumps(q)
self.data = urllib.parse.urlencode(i)
self.data = requests.utils.requote_uri(i)
Copy link
Preview

Copilot AI May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing urllib.parse.urlencode with requests.utils.requote_uri on a dictionary 'i' may lead to improper URL encoding. Consider using urllib.parse.urlencode or another proper encoding method to ensure correct form-encoding of parameters.

Copilot uses AI. Check for mistakes.

@@ -272,7 +272,7 @@ def before_request(self):
if 'keys' in i:
keys = [convert_key(k) for k in json.loads(i['keys'])]
i['keys'] = json.dumps(keys)
self.data = urllib.parse.urlencode(i)
self.data = requests.utils.requote_uri(i)
Copy link
Preview

Copilot AI May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using requests.utils.requote_uri on a dictionary might not perform the intended URL encoding—unlike urllib.parse.urlencode which transforms a dict into a URL-encoded string. Revert to a proper form encoding function or utilize requests’ built-in handling for form data.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for tackling this one! This looks super close, please let us know if you hit any issues!


self.opener = urllib.request.build_opener()
self.opener.add_handler(urllib.request.HTTPCookieProcessor(self.cookiejar))
self.sesh = requests.Session() # Session object handles cookies automatically
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally call this session instead of sesh.

Suggested change
self.sesh = requests.Session() # Session object handles cookies automatically
self.session = requests.Session() # Session object handles cookies automatically

Comment on lines 87 to 88
prepped = req.prepare()
resp = s.send(prepped)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray looked into prepare and it seems like we shouldn't need it; it seems like it is primarily used for caching an http request that we might need to send multiple times, to avoid having to prepare it every time. But in this case, we don't save it or anything so it serves no purpose.

Suggested change
prepped = req.prepare()
resp = s.send(prepped)
resp = s.send(req)

Comment on lines 28 to 29
prepped = req.prepare()
return self.sesh.send(prepped)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

Suggested change
prepped = req.prepare()
return self.sesh.send(prepped)
return self.sesh.send(req)

Comment on lines 30 to 31
prepped = req.prepare()
return self.sesh.send(prepped)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prepped = req.prepare()
return self.sesh.send(prepped)
return self.sesh.send(req)

content = handle.read()
r = requests.get(SEED_PATH)
r.raise_for_status()
content = r.text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be bytes

Suggested change
content = r.text
content = r.content

self.status_code = resp.status_code
# a one-liner that replaces the __str__ functionality of the HTTPResponse object replaced from urllib
self.status_msg = "\n".join(f"{k}: {v}" for k, v in resp.headers.items())
self.output = resp.read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't exist on the requests Response; it should be

Suggested change
self.output = resp.read()
self.output = resp.content

def read(self):
return b"seed=1234"
class MockGet:
text = b"seed=1234"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's bytes, this should now be content

Suggested change
text = b"seed=1234"
content = b"seed=1234"


def __enter__(self):
return self

def __exit__(self, *args):
pass

return MockRead()
return MockGet()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name for this is likely MockResponse()

@RayBB RayBB moved this from Waiting Review/Merge from Staff to Someone else is working on it in Ray's Project May 14, 2025
@johnnymck
Copy link
Contributor Author

@cdrini I've since made the changes you requested. Can you have another review on this when you have time? 😄

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label May 31, 2025
@RayBB RayBB requested a review from cdrini June 3, 2025 02:31
@RayBB RayBB moved this from Someone else is working on it to Waiting Review/Merge from Staff in Ray's Project Jun 3, 2025
Copy link
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's looking really. Here are three small changes I'll commit. Then we can get @cdrini to look

Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @johnnymck and @RayBB ! Everything looks 100%!

@cdrini cdrini merged commit 08ede86 into internetarchive:master Jun 4, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from Waiting Review/Merge from Staff to Done in Ray's Project Jun 4, 2025
@cclauss
Copy link
Contributor

cclauss commented Jun 4, 2025

Time to modernize to
https://www.python-httpx.org ?

@johnnymck johnnymck deleted the refactor/remove-urllib2-requests branch June 9, 2025 11:45
@johnnymck
Copy link
Contributor Author

Excellent! Thanks so much guys - glad to have helped, even in some small way. I'll have to pick more tasks in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Response Issues which require feedback from lead
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Replace all urllib / urllib2 imports and uses with requests
4 participants