-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Remove references to urllib.request and replace with requests library refs #10759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove references to urllib.request and replace with requests library refs #10759
Conversation
… into refactor/remove-urllib2-requests
…ctionality handled by Session object
…ionality supported inside request Session() object
for more information, see https://pre-commit.ci
…/johnnymck/openlibrary into refactor/remove-urllib2-requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @johnnymck welcome and thanks for making your contribution!
The tricky part of this issue is testing. Such as, making the adapter.py file.
After making these changes did you try starting your local instance, logging in, and using a few basic functions of the site? That's a good first check to do.
I left you to short questions on the code itself.
Finally, before we can review this please make sure to fix the mypy errors visible here. You should be able to reproduce these locally when you setup pre-commit as is mentioned in our readme. That way you don't have to push up and wait for the run to see if it's fixed.
Good luck with the changes and thanks for opening the PR!
scripts/tests/test_obfi.py
Outdated
|
||
def __enter__(self): | ||
return self | ||
|
||
def __exit__(self, *args): | ||
pass | ||
|
||
return MockRead() | ||
return MockGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this now return MockGet()
instead of MockRead()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name for this is likely MockResponse()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the code to replace outdated urllib references with the requests library, streamlining HTTP calls and simplifying cookie management. Key changes include:
- Removing urllib and cookielib dependencies and adding requests across multiple modules.
- Updating HTTP request and response handling in both production and test code.
- Adjusting seed fetching in obfi modules to work with requests.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
scripts/tests/test_obfi.py | Changed mocks and monkeypatches to use requests.get instead of urllib.request.urlopen. |
scripts/solr_updater.py | Updated URL fetching and JSON parsing to use requests.get and response.json(). |
scripts/obfi/mktable.py | Replaced urllib.request.urlopen with requests.get and adjusted exception handling. |
scripts/obfi/hide.py | Updated seed retrieval to use requests.get instead of urllib.request.urlopen. |
openlibrary/plugins/upstream/adapter.py | Converted HTTP request handling from urllib to requests; updated parameter encoding with requests utilities. |
openlibrary/plugins/openlibrary/tests/test_ratingsapi.py | Modified tests to use a requests.Session and adjusted login data encoding. |
openlibrary/plugins/openlibrary/tests/test_listapi.py | Replaced urllib and cookielib with requests and updated POST data handling. |
@@ -263,7 +263,7 @@ def before_request(self): | |||
q = json.loads(i['query']) | |||
q = convert_dict(q) | |||
i['query'] = json.dumps(q) | |||
self.data = urllib.parse.urlencode(i) | |||
self.data = requests.utils.requote_uri(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing urllib.parse.urlencode with requests.utils.requote_uri on a dictionary 'i' may lead to improper URL encoding. Consider using urllib.parse.urlencode or another proper encoding method to ensure correct form-encoding of parameters.
Copilot uses AI. Check for mistakes.
@@ -272,7 +272,7 @@ def before_request(self): | |||
if 'keys' in i: | |||
keys = [convert_key(k) for k in json.loads(i['keys'])] | |||
i['keys'] = json.dumps(keys) | |||
self.data = urllib.parse.urlencode(i) | |||
self.data = requests.utils.requote_uri(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using requests.utils.requote_uri on a dictionary might not perform the intended URL encoding—unlike urllib.parse.urlencode which transforms a dict into a URL-encoded string. Revert to a proper form encoding function or utilize requests’ built-in handling for form data.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for tackling this one! This looks super close, please let us know if you hit any issues!
|
||
self.opener = urllib.request.build_opener() | ||
self.opener.add_handler(urllib.request.HTTPCookieProcessor(self.cookiejar)) | ||
self.sesh = requests.Session() # Session object handles cookies automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally call this session instead of sesh
.
self.sesh = requests.Session() # Session object handles cookies automatically | |
self.session = requests.Session() # Session object handles cookies automatically |
prepped = req.prepare() | ||
resp = s.send(prepped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ray looked into prepare
and it seems like we shouldn't need it; it seems like it is primarily used for caching an http request that we might need to send multiple times, to avoid having to prepare it every time. But in this case, we don't save it or anything so it serves no purpose.
prepped = req.prepare() | |
resp = s.send(prepped) | |
resp = s.send(req) |
prepped = req.prepare() | ||
return self.sesh.send(prepped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here.
prepped = req.prepare() | |
return self.sesh.send(prepped) | |
return self.sesh.send(req) |
prepped = req.prepare() | ||
return self.sesh.send(prepped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepped = req.prepare() | |
return self.sesh.send(prepped) | |
return self.sesh.send(req) |
scripts/obfi/hide.py
Outdated
content = handle.read() | ||
r = requests.get(SEED_PATH) | ||
r.raise_for_status() | ||
content = r.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be bytes
content = r.text | |
content = r.content |
self.status_code = resp.status_code | ||
# a one-liner that replaces the __str__ functionality of the HTTPResponse object replaced from urllib | ||
self.status_msg = "\n".join(f"{k}: {v}" for k, v in resp.headers.items()) | ||
self.output = resp.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't exist on the requests Response; it should be
self.output = resp.read() | |
self.output = resp.content |
scripts/tests/test_obfi.py
Outdated
def read(self): | ||
return b"seed=1234" | ||
class MockGet: | ||
text = b"seed=1234" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's bytes, this should now be content
text = b"seed=1234" | |
content = b"seed=1234" |
scripts/tests/test_obfi.py
Outdated
|
||
def __enter__(self): | ||
return self | ||
|
||
def __exit__(self, *args): | ||
pass | ||
|
||
return MockRead() | ||
return MockGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name for this is likely MockResponse()
@cdrini I've since made the changes you requested. Can you have another review on this when you have time? 😄 |
Co-authored-by: Copilot <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's looking really. Here are three small changes I'll commit. Then we can get @cdrini to look
Co-authored-by: Copilot <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @johnnymck and @RayBB ! Everything looks 100%!
Time to modernize to |
Excellent! Thanks so much guys - glad to have helped, even in some small way. I'll have to pick more tasks in future. |
Closes #2852
This PR refactors references to urllib.request (as per this discussion point) and replaces them with up-to-date functionality from the requests library.
Also worth noting that this is my first contribution to this project, and my second-ever FOSS code contribution. Please be kind! If I've done anything silly or unhandy, please don't hesitate to mention, just note it was done out of ignorance rather than laziness or lack of skill.
Technical
PR Also removes some references to no-longer needed lib import cookielib, as this functionality is also handled by Session object in requests.
Testing
Some functionality updated is not covered by tests, for example, adapter.py. Any advice on testing this/adding coverage is appreciated!
Screenshot
Stakeholders
@RayBB @mekarpeles