Closed
Description
A webpage has been detcected as being ISO-8859-1 encoded, even though it is encoded in utf-8.
Expected Result
Correct classification as utf-8.
Actual Result
utf-8 page detected as ISO-8859-1.
Reproduction Steps
#!/usr/bin/python
import requests
# example url
url = "https://digitalezivilgesellschaft.org/"
# get the page and print the supposed encoding
response = requests.get(url)
print(response.encoding)
Compare that with
rm -f index.html; wget -nv https://digitalezivilgesellschaft.org/ 2>/dev/null&& file index.html | grep index | tail -1
System Information
$ python -m requests.help
explore_requests_bug$ python -m requests.help
{
"chardet": {
"version": "3.0.4"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "2.9"
},
"implementation": {
"name": "CPython",
"version": "3.8.2"
},
"platform": {
"release": "5.6.8-arch1-1",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.23.0"
},
"system_ssl": {
"version": "1010107f"
},
"urllib3": {
"version": "1.25.9"
},
"using_pyopenssl": false
}
This concrete problem seems to be related to the more general issue
#2086
Metadata
Metadata
Assignees
Labels
No labels