Skip to content

utf-8 page wrongly detected as ISO-8859-1 #5445

Closed
@klartext

Description

@klartext

A webpage has been detcected as being ISO-8859-1 encoded, even though it is encoded in utf-8.

Expected Result

Correct classification as utf-8.

Actual Result

utf-8 page detected as ISO-8859-1.

Reproduction Steps

#!/usr/bin/python

import requests

# example url
url = "https://digitalezivilgesellschaft.org/"

# get the page and print the supposed encoding
response = requests.get(url)
print(response.encoding)

Compare that with

rm -f index.html; wget -nv https://digitalezivilgesellschaft.org/  2>/dev/null&& file index.html  | grep index | tail -1

System Information

$ python -m requests.help
explore_requests_bug$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.9"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.2"
  },
  "platform": {
    "release": "5.6.8-arch1-1",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.23.0"
  },
  "system_ssl": {
    "version": "1010107f"
  },
  "urllib3": {
    "version": "1.25.9"
  },
  "using_pyopenssl": false
}

This concrete problem seems to be related to the more general issue
#2086

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions