Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix of the encoding and/or downlaod issue(s) #520

Merged
merged 19 commits into from
Mar 3, 2018

Conversation

funilrys
Copy link
Contributor

@funilrys funilrys commented Feb 28, 2018

Hello, to all member of this community!

After a long time of commenting, watching, reading and learning from issues posted here it's time for me to propose the following patches/commits.

It's been a long time since I wanted to search a possible solution to this recurrent issue and here's my proposition.

Basically, since I consider that many encoding issues are because upstream maintainer does not convert/encode the domains which rely on IDN homograph attack, I decided to read all line and convert/encode all domain using str.encode('IDNA') (I do not change original structure).

Because my main objective was to fix @notDavid's protocol which is the only one I was able to reproduce, I introduced BeautifulSoup() to do the decoding of the downloaded data before we read each line and encode/decode.

About the tests

Before anything, sorry for adding the fix tests issues commits but I wanted to fix those before submitting my PR. You can find the tests I used as the reference to committing those change at https://travis-ci.org/funilrys/hosts.

Please note that I have to submit an issue to upstream but if you see the following or related when runing the tests under Python >= 3 it's not a local issue.

/home/travis/miniconda3/envs/hosts/lib/python3.5/site-packages/bs4/builder/_lxml.py:250: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead

About requirements*.txt

This submission also introduces requirements.txt files which I generally use to install dependencies with pip install -r requirements.txt (I'm almost everytime under virtualenv).

As I did not get an answer (cf) I took the initiative to includes them anyway.

Please note that the requirements*.txt files were generated by pipreqs.

So to quote my commit message

Please note that this patch also introduces domain_to_idna()
which is in charge of converting a domain in a line into
IDNA and/or UTF-8 format.

Also, note the introduction of BeautifulSoup() which helps
us to decode data from the downloaded URL.

Fixes (issue(s)/protocol(s) I was able to reproduce):

Possible fix of (issue(s)/protocol(s) I wasn't able to reproduce):

Footnote(s)

Sorry for the typo or bad English structure in my commits, code or this description but English is still a language I learn.

So I'm open to any pieces of advice and others around my English, coding style or other 😸

Cheers,
Nissar.

Please note that this patch also introduce
which is in charge of converting a domain in a line into
IDNA and/or UTF-8 format.

Also note the introduction of BeautifulSoup() which helps
us to decode data from the downloaded URL.

Fixes (issue(s)/protocol(s) I was able to reproduce):
 * StevenBlack#514 (comment)

Possible fix of (issue(s)/protocol(s) I wasn't able to reproduce):
 * StevenBlack#514 (comment)
 * StevenBlack#494 (comment)
 * StevenBlack#420 (comment)
 * StevenBlack#372 (comment)
 * StevenBlack#382 (comment)
Please note that those file can be used to install
dependencies with 'pip install -r requirements.txt'
This patch fix https://travis-ci.org/funilrys/hosts/jobs/347500695#L397

Please also note that I introduced that patch because
we do not directly use lxml but it is required by
BeautifulSup() to parse the HTML.
This patch introduce the installation of dependencies needed my the main commit.

This patch fixes:
 * https://travis-ci.org/funilrys/hosts/jobs/347504195#L592
 * https://travis-ci.org/funilrys/hosts/jobs/347504195#L598
@welcome
Copy link

welcome bot commented Feb 28, 2018

Thank you for submitting this pull request! We’ll get back to you as soon as we can!

@@ -24,7 +24,7 @@
import time
from glob import glob

import lxml
import lxml # noqa: F401
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@funilrys what is the significance of # noqa: F401 here?

Copy link
Contributor Author

@funilrys funilrys Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StevenBlack noqa is a tag to tell flake8 to ignore error in that line (more info here)

If you remove that comments, flake8 produce https://travis-ci.org/funilrys/hosts/jobs/347500695#L598

./updateHostsFile.py:27:1: F401 'lxml' imported but unused

Explanation from 079d5dd :

Please also note that I introduced that patch because
we do not directly use lxml but it is required by
BeautifulSup() to parse the HTML.

@StevenBlack
Copy link
Owner

@funilrys I'm getting crashes when I issue

pip install -r requirements.txt

and

pip install -r requirements_python2.txt

In the first case:

$ pip install -r requirements.txt
Collecting lxml==4.1.1 (from -r requirements.txt (line 1))
  Downloading lxml-4.1.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.7MB)
    100% |████████████████████████████████| 8.7MB 196kB/s
Collecting beautifulsoup4==4.6.0 (from -r requirements.txt (line 2))
  Downloading beautifulsoup4-4.6.0-py2-none-any.whl (86kB)
    100% |████████████████████████████████| 92kB 7.9MB/s
Collecting mock==2.0.0 (from -r requirements.txt (line 3))
  Downloading mock-2.0.0-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 6.8MB/s
Collecting pbr>=0.11 (from mock==2.0.0->-r requirements.txt (line 3))
  Downloading pbr-3.1.1-py2.py3-none-any.whl (99kB)
    100% |████████████████████████████████| 102kB 10.6MB/s
Collecting funcsigs>=1; python_version < "3.3" (from mock==2.0.0->-r requirements.txt (line 3))
  Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Collecting six>=1.9 (from mock==2.0.0->-r requirements.txt (line 3))
  Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: lxml, beautifulsoup4, pbr, funcsigs, six, mock
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "/Library/Python/2.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
  File "/Library/Python/2.7/site-packages/pip/wheel.py", line 316, in clobber
    ensure_dir(destdir)
  File "/Library/Python/2.7/site-packages/pip/utils/__init__.py", line 83, in ensure_dir
    os.makedirs(path)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/lxml-4.1.1.dist-info'

In the second case:

$ pip install -r requirements_python2.txt
Collecting mock==2.0.0 (from -r requirements_python2.txt (line 1))
  Using cached mock-2.0.0-py2.py3-none-any.whl
Collecting lxml==4.1.1 (from -r requirements_python2.txt (line 2))
  Using cached lxml-4.1.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting beautifulsoup4==4.6.0 (from -r requirements_python2.txt (line 3))
  Using cached beautifulsoup4-4.6.0-py2-none-any.whl
Collecting pbr>=0.11 (from mock==2.0.0->-r requirements_python2.txt (line 1))
  Using cached pbr-3.1.1-py2.py3-none-any.whl
Collecting funcsigs>=1; python_version < "3.3" (from mock==2.0.0->-r requirements_python2.txt (line 1))
  Using cached funcsigs-1.0.2-py2.py3-none-any.whl
Collecting six>=1.9 (from mock==2.0.0->-r requirements_python2.txt (line 1))
  Using cached six-1.11.0-py2.py3-none-any.whl
Installing collected packages: pbr, funcsigs, six, mock, lxml, beautifulsoup4
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "/Library/Python/2.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
  File "/Library/Python/2.7/site-packages/pip/wheel.py", line 316, in clobber
    ensure_dir(destdir)
  File "/Library/Python/2.7/site-packages/pip/utils/__init__.py", line 83, in ensure_dir
    os.makedirs(path)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/pbr-3.1.1.dist-info'

Take note that sudo does not help in either case.

@StevenBlack
Copy link
Owner

StevenBlack commented Mar 1, 2018

@funilrys some more feedback: Since pip install -r requirements... is a prerequisite, that should be documented in the readme_template.md file in the Generate your own unified hosts file section.

@FadeMind
Copy link
Contributor

FadeMind commented Mar 1, 2018

Working like a charm. @funilrys Thanks!!

@funilrys
Copy link
Contributor Author

funilrys commented Mar 1, 2018

Hello Steven @StevenBlack,

Before anything please avoid sudo pip as it may damage the python package(s) infrastructure of your system.

I thought that we all assume that virtualenv is always a good practice as you use a virtual environment (conda environment) to launch the tests.

So to fix such issue we have 3 possibilities:

Install dependencies from your package manager

For ubuntu

$ apt-get install python-lxml python3-lxml python-bs4 python3-bs4

For Arch Linux (the only one I use)

$ pacman -Syyu python-lxml python2-lxml python-beautifulsoup4 python2-beautifulsoup4

Then you simply run

$ python updateHostsFile.py -a

Install with pip along with --user (without virtualenv)

This method is always preferred over sudo pip for the reason I mentioned before.
To quote pip documentation :

--user:
Install to the Python user install directory for your platform. Typically ~/.local/, or %APPDATA%Python on Windows. (See the Python documentation for site.USER_BASE for full details.)

$ pip3 install -r requirements.txt --user
$ pip2 install -r requirements_python2.txt --user
$ python updateHostsFile.py -a

Use virtualenv

To quote Kenneth Reitz (Python Overlord at Heroku and Director at the PSF):

virtualenv is a tool to create isolated Python environments. virtualenv creates a folder which contains all the necessary executables to use the packages that a Python project would need.

Here's the procedure:

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python updateHostsFile.py -a
$ deactivate # to exit virtualenv

Now before I or we edit the readme_template.md you have to make a choice ... which one do we recommend to users?

@rautamiekka
Copy link

Before anything please avoid sudo pip as it may damage the python package(s) infrastructure of your system.

Not much of a choice when you need newer packages.

@funilrys
Copy link
Contributor Author

funilrys commented Mar 1, 2018

@rautamiekka sorry but no ... I proposed 3 alternatives to sudo pip ...

@funilrys
Copy link
Contributor Author

funilrys commented Mar 1, 2018

Unless you are as root into a server or a container you should absolutely avoid such things.. Unless you like to break your system to learn like me sometimes...

A recent example of why you should not run sudo combined with command you think you can trust is this and its detail here.

@StevenBlack
Copy link
Owner

@funilrys we need a solution that works on MacOS and the recent versions of Windows too.

@funilrys
Copy link
Contributor Author

funilrys commented Mar 1, 2018

We then have those two solutions:

  • Install with pip along with --user (without virtualenv)
  • Use virtualenv

@gfyoung
Copy link
Contributor

gfyoung commented Mar 1, 2018

@funilrys :

  1. sudo pip is not that too scary IMO, so @StevenBlack , I think that's acceptable.

  2. Add unittests for your added function.

@funilrys
Copy link
Contributor Author

funilrys commented Mar 1, 2018

@gfyoung sudo npm was not too scary too ...

Will add unit tests once I'm back home.

@gfyoung
Copy link
Contributor

gfyoung commented Mar 1, 2018

sudo npm was not too scary too ...

It isn't when you can remove things from your package.json 😄

@notDavid
Copy link

notDavid commented Mar 2, 2018

Hi there, this works for me too on macOS 👍

One minor issue though, i was curious what hostnames were causing the problem, and i notice that with #520 line 147886 in https://hosts-file.net/psh.txt :

www.hualañe.cl
is being downloaded/converted to
www.xn--hualae-0wa.cl

so basically this host is not being blocked...
(don't shoot the messenger... 🤷🏼‍♂️ 😁)

Probably obvious, but you can reproduce this by doing this, + apply this pull request, and then in step 8 instead of mv psh.txt hosts do cp psh.txt hosts.
Then after running step9, compare file psh.txt with hosts.

@funilrys
Copy link
Contributor Author

funilrys commented Mar 2, 2018

@notDavid www.xn--hualae-0wa.cl == www.hualañe.cl we are talking about the same domain...

Please read more about Punycode. You can also verify with Punycoder.

@notDavid
Copy link

notDavid commented Mar 2, 2018

@funilrys Ahhhhhh i see!

funilrys added 3 commits March 2, 2018 21:43
Please note that I have added that '#' by mistake.
This patch review the way we get the comment at the end of a line.
I also did an application of DRY (Do not Repeat Yourself)
and/or KISS (Keep It Simple, Stupid) by refactoring the 2 `else`
statements into one line.
@StevenBlack
Copy link
Owner

Nissar @funilrys this is an excellent PR. Just those doc changes and we'll merge!

@funilrys
Copy link
Contributor Author

funilrys commented Mar 3, 2018

Steven @StevenBlack What I see here is another problem! Some OS have python2 as python or python3 as python so as pip2 and pip3 exists I'll put them instead of pip as pip2 as you may have suggested :)

@funilrys
Copy link
Contributor Author

funilrys commented Mar 3, 2018

@StevenBlack Can you confirm the existence of pip2 on macOS? Because I can't consider what I said previously (as an Arch Linux user) as granted for every OS...

@StevenBlack
Copy link
Owner

StevenBlack commented Mar 3, 2018

Nissar @funilrys yup, it looks native. In my case, on the office iMac:

$ which pip2
/usr/local/bin/pip2

$ which pip
/usr/local/homebrew/bin/pip

$ which pip3
/usr/local/homebrew/bin/pip3

@funilrys
Copy link
Contributor Author

funilrys commented Mar 3, 2018

Okay Steven @StevenBlack then I'm going to edit readme_template.md as follow:

Note if you are using Python 3, please install the dependencies with:

  pip3 install --user -r requirements.txt

Note if you are using Python 2, please install the dependencies with:

  pip2 install --user -r requirements_python2.txt

I do prefer to recommend with --user and let the user know more about that --user with the following as note (please correct my grammar or complete sentence if needed):

As we prefer not to break your system, we recommend the usage of --user which install the required dependencies at the user level. More information about it can be found on pip documentation.

@StevenBlack StevenBlack mentioned this pull request Mar 3, 2018
@FadeMind
Copy link
Contributor

FadeMind commented Mar 3, 2018

@StevenBlack Manjaro Linux:

[fademind@manjaro ~]$ which pip
/usr/bin/pip
[fademind@manjaro ~]$ which pip2
/usr/bin/pip2
[fademind@manjaro ~]$ which pip3
/usr/bin/pip3
[fademind@manjaro ~]$ export LANG=C;pacman -Qo /usr/bin/pip /usr/bin/pip2 /usr/bin/pip3
/usr/bin/pip is owned by python-pip 9.0.1-3
/usr/bin/pip2 is owned by python2-pip 9.0.1-3
/usr/bin/pip3 is owned by python-pip 9.0.1-3

Aaand packages:

[fademind@manjaro ~]$ pacman -Q|grep python
python 3.6.4-2
python-appdirs 1.4.3-1
python-beautifulsoup4 4.6.0-1
python-cairo 1.16.2-1
python-chardet 3.0.4-1
python-dbus 1.2.6-1
python-dbus-common 1.2.6-1
python-docopt 0.6.2-4
python-gobject 3.26.1-1
python-idna 2.6-1
python-jade-application-kit 0.a34-1
python-jinja 2.10-1
python-keyutils 0.5-1
python-lxml 4.1.1-1
python-lxml-docs 4.1.1-1
python-markupsafe 1.0-1
python-mock 2.0.0-2
python-npyscreen 4.10.5-1
python-olefile 0.45.1-1
python-packaging 16.8-2
python-pbr 3.1.1-1
python-pillow 5.0.0-1
python-pip 9.0.1-3
python-pycups 1.9.73-3
python-pycurl 7.43.0.1-1
python-pygments 2.2.0-1
python-pyparsing 2.2.0-1
python-pyqt5 5.10-3
python-pysmbc 1.0.15.8-1
python-reportlab 3.4.0-1
python-requests 2.18.4-1
python-setuptools 1:38.5.1-1
python-sip 4.19.7-1
python-six 1.11.0-1
python-urllib3 1.22-1
python-yaml 3.12-3
python2 2.7.14-2
python2-appdirs 1.4.3-1
python2-beautifulsoup4 4.6.0-1
python2-cairo 1.16.2-1
python2-funcsigs 1.0.2-1
python2-gobject2 2.28.7-1
python2-lxml 4.1.1-1
python2-mock 2.0.0-2
python2-packaging 16.8-2
python2-pbr 3.1.1-1
python2-pip 9.0.1-3
python2-pyparsing 2.2.0-1
python2-setuptools 1:38.5.1-1
python2-six 1.11.0-1

@StevenBlack
Copy link
Owner

Nissar @funilrys perfect!

@StevenBlack
Copy link
Owner

@funilrys a small suggestion.

We recommend the --user flag which installs the required dependencies at the user level. More information about it can be found on pip documentation.

Please note that this patch explitly set which `pip` version to use according
to the user Python version.
@StevenBlack
Copy link
Owner

Nissar @funilrys still missing one thing: To run unit tests, in the top level directory, just run:... should come after installing the requirements since latest tests will fail without the requirements.

@funilrys
Copy link
Contributor Author

funilrys commented Mar 3, 2018

That's logical sorry ...

Please note that this patch mothe the unit tests paragraph after dependencies installation.
@StevenBlack
Copy link
Owner

Merging. Thanks so much Nissar @funilrys and everyone who provided input on this!

@StevenBlack StevenBlack merged commit aa6b095 into StevenBlack:master Mar 3, 2018
@welcome
Copy link

welcome bot commented Mar 3, 2018

Congratulations on merging your first pull request! 🎉🎉🎉

@ScriptTiger
Copy link
Contributor

ScriptTiger commented Mar 3, 2018

Perhaps a note of some kind should also be added in a different PR for those installing the dependencies manually for whatever reason. Beautiful Soup (which goes to version 3 and only supports Python 2) is actually a different package from Beautiful Soup 4 (the bs4 packages for Python 2 and 3). It might be different depending on how the repositories manage them, but that at least seems to be the author's intent and what I have seen in the repos I use. So if they install the latest Beautiful Soup package and don't know why it's throwing an error when it can't find the bs4 module, that's why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants