Open
Description
Description
When detecting the character set of a plain ASCII file containing newline (\n, ASCII 10) or other control characters, chardet incorrectly identifies it as ISO-8859-1 instead of ASCII. This happens because control characters (ASCII 0-31, 127) are present, which seems to influence the detection process.
Since ASCII includes both printable characters (32-126) and control characters (0-31, 127), the presence of these should not change the classification to a different encoding like ISO-8859-1.
Expected Behavior
Files containing only bytes 0-127 (including control characters like \n) should be correctly detected as ASCII, not ISO-8859-1.
Steps to Reproduce
- Create a file (ascii.txt) with the following content:
Hello, World!
This is a test.
(Ensure there is a newline at the end of the file.)
- Run the following script:
import chardet from 'chardet';
console.log(chardet.detectFileSync('ascii.txt')); // Expected: 'ASCII', but gets 'ISO-8859-1']
Metadata
Metadata
Assignees
Labels
No labels