Closed
Description
Hi, today I noticed a sudden change in the way text is extracted from PDFs. It seems like a lot of the binary content is being included. This is causing our tests to fail:
We've been able to resolve this quickly on our end by downgrading the package version; but just wanted to give you guys a heads-up.
EDIT: On further investigation, it looks like a change in the python API caused the issue:
Traceback (most recent call last):
File "/home/bls/Downloads/code/bbot/bbot/modules/extractous.py", line 135, in extract_text
buffer = reader.read(4096)
^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'read'
Metadata
Metadata
Assignees
Labels
No labels