-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible fix of the encoding and/or downlaod issue(s) #520
Conversation
Please note that this patch also introduce which is in charge of converting a domain in a line into IDNA and/or UTF-8 format. Also note the introduction of BeautifulSoup() which helps us to decode data from the downloaded URL. Fixes (issue(s)/protocol(s) I was able to reproduce): * StevenBlack#514 (comment) Possible fix of (issue(s)/protocol(s) I wasn't able to reproduce): * StevenBlack#514 (comment) * StevenBlack#494 (comment) * StevenBlack#420 (comment) * StevenBlack#372 (comment) * StevenBlack#382 (comment)
Please note that those file can be used to install dependencies with 'pip install -r requirements.txt'
This patch fix https://travis-ci.org/funilrys/hosts/jobs/347500695#L397 Please also note that I introduced that patch because we do not directly use lxml but it is required by BeautifulSup() to parse the HTML.
This patch introduce the installation of dependencies needed my the main commit. This patch fixes: * https://travis-ci.org/funilrys/hosts/jobs/347504195#L592 * https://travis-ci.org/funilrys/hosts/jobs/347504195#L598
Thank you for submitting this pull request! We’ll get back to you as soon as we can! |
updateHostsFile.py
Outdated
@@ -24,7 +24,7 @@ | |||
import time | |||
from glob import glob | |||
|
|||
import lxml | |||
import lxml # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@funilrys what is the significance of # noqa: F401
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StevenBlack noqa
is a tag to tell flake8 to ignore error in that line (more info here)
If you remove that comments, flake8 produce https://travis-ci.org/funilrys/hosts/jobs/347500695#L598
./updateHostsFile.py:27:1: F401 'lxml' imported but unused
Explanation from 079d5dd :
Please also note that I introduced that patch because
we do not directly use lxml but it is required by
BeautifulSup() to parse the HTML.
@funilrys I'm getting crashes when I issue
and
In the first case:
In the second case:
Take note that |
@funilrys some more feedback: Since |
Working like a charm. @funilrys Thanks!! |
Hello Steven @StevenBlack, Before anything please avoid I thought that we all assume that virtualenv is always a good practice as you use a virtual environment (conda environment) to launch the tests. So to fix such issue we have 3 possibilities: Install dependencies from your package managerFor ubuntu
For Arch Linux (the only one I use)
Then you simply run
Install with pip along with
|
Not much of a choice when you need newer packages. |
@rautamiekka sorry but no ... I proposed 3 alternatives to |
@funilrys we need a solution that works on MacOS and the recent versions of Windows too. |
We then have those two solutions:
|
|
@gfyoung Will add unit tests once I'm back home. |
It isn't when you can remove things from your |
Hi there, this works for me too on macOS 👍 One minor issue though, i was curious what hostnames were causing the problem, and i notice that with #520 line 147886 in https://hosts-file.net/psh.txt :
so basically this host is not being blocked... Probably obvious, but you can reproduce this by doing this, + apply this pull request, and then in step 8 instead of |
@funilrys Ahhhhhh i see! |
Please note that I have added that '#' by mistake.
This patch review the way we get the comment at the end of a line. I also did an application of DRY (Do not Repeat Yourself) and/or KISS (Keep It Simple, Stupid) by refactoring the 2 `else` statements into one line.
Nissar @funilrys this is an excellent PR. Just those doc changes and we'll merge! |
Steven @StevenBlack What I see here is another problem! Some OS have |
@StevenBlack Can you confirm the existence of |
Nissar @funilrys yup, it looks native. In my case, on the office iMac:
|
Okay Steven @StevenBlack then I'm going to edit readme_template.md as follow:
I do prefer to recommend with
|
@StevenBlack Manjaro Linux:
Aaand packages:
|
Nissar @funilrys perfect! |
@funilrys a small suggestion.
|
Please note that this patch explitly set which `pip` version to use according to the user Python version.
Nissar @funilrys still missing one thing: To run unit tests, in the top level directory, just run:... should come after installing the requirements since latest tests will fail without the requirements. |
That's logical sorry ... |
Please note that this patch mothe the unit tests paragraph after dependencies installation.
Merging. Thanks so much Nissar @funilrys and everyone who provided input on this! |
Congratulations on merging your first pull request! 🎉🎉🎉 |
Perhaps a note of some kind should also be added in a different PR for those installing the dependencies manually for whatever reason. Beautiful Soup (which goes to version 3 and only supports Python 2) is actually a different package from Beautiful Soup 4 (the bs4 packages for Python 2 and 3). It might be different depending on how the repositories manage them, but that at least seems to be the author's intent and what I have seen in the repos I use. So if they install the latest Beautiful Soup package and don't know why it's throwing an error when it can't find the bs4 module, that's why. |
This patch fix : StevenBlack/hosts#520 (comment)
This patch fix : StevenBlack/hosts#520 (comment) + It also fix (forgoten coma) : StevenBlack/hosts#520 (comment)
Hello, to all member of this community!
After a long time of commenting, watching, reading and learning from issues posted here it's time for me to propose the following patches/commits.
It's been a long time since I wanted to search a possible solution to this recurrent issue and here's my proposition.
Basically, since I consider that many encoding issues are because upstream maintainer does not convert/encode the domains which rely on IDN homograph attack, I decided to read all line and convert/encode all domain using
str.encode('IDNA')
(I do not change original structure).Because my main objective was to fix @notDavid's protocol which is the only one I was able to reproduce, I introduced
BeautifulSoup()
to do the decoding of the downloaded data before we read each line and encode/decode.About the tests
Before anything, sorry for adding the
fix tests issues
commits but I wanted to fix those before submitting my PR. You can find the tests I used as the reference to committing those change at https://travis-ci.org/funilrys/hosts.Please note that I have to submit an issue to upstream but if you see the following or related when runing the tests under
Python >= 3
it's not a local issue.About
requirements*.txt
This submission also introduces
requirements.txt
files which I generally use to install dependencies withpip install -r requirements.txt
(I'm almost everytime undervirtualenv
).As I did not get an answer (cf) I took the initiative to includes them anyway.
Please note that the
requirements*.txt
files were generated by pipreqs.So to quote my commit message
Please note that this patch also introduces
domain_to_idna()
which is in charge of converting a domain in a line into
IDNA and/or UTF-8 format.
Also, note the introduction of BeautifulSoup() which helps
us to decode data from the downloaded URL.
Fixes (issue(s)/protocol(s) I was able to reproduce):
Possible fix of (issue(s)/protocol(s) I wasn't able to reproduce):
Footnote(s)
Sorry for the typo or bad English structure in my commits, code or this description but English is still a language I learn.
So I'm open to any pieces of advice and others around my English, coding style or other 😸
Cheers,
Nissar.