Releases: seanpedrick-case/doc_redaction
v0.7.2
Corrected logging. Text redaction can now produce ocr_outputs_with_words objects. Duplicate detection by line now possible (rather than by page)
e424038: Updated packages. Corrected CSV logger headings, can now submit custom log csv names to S3. Started work on identifying and deduplicating at the line level
c8ffcd4: Further updates to line level duplicate identification
ef4000e: Local text redaction now produces ocr results with words json and can make dataframe format
52e26c1: Updated save options for ocr_outputs_with_words
v0.7.1
v0.7.0
Revamped duplicate page/subdocument removal, CDK code, updated documentation, read-only file system compatability:
a7566b9: Adapted Dockerfile for systems with read only file system. Minor package updates.
36574ae: Added folder with CDK code and app. Updated config.py file to be compatible with all temp folders needed for read only file systems
c3d1c4c: Added source files for quarto documentation website
ab04c92: Updated duplicate pages functionality. Improve redaction efficiency a little with pd.concat method. Minor modification to documentation and interface
f47b137: Updated duplicate pages interface to include subdocuments and review. Updated relevant user guide. Minor package updates
3946be6: Minor update to ensure whole page redactions are applied correctly to a document with existing redactions
v0.6.8
You can now remove all redactions with the same text based on the selected row on the review page. More config options for model and entity type visibility.
c4e3724: Added capability to redact all redactions with the same text based on the selected row. Rearranged buttons on review page a little. Improved page navigation efficiency.
bce761b: Added possibility of changing model and entity types in config file
c28176d: Update version numbers and readme
v0.6.7
v0.6.6
v0.6.5
PDF compression removed by default, some new config options, minor formatting and layout changes
3bbf593: Added config options for compressing output pdfs, returning output redacted pdfs at all, and for changing the length of time for showing previous Textract jobs
5a21738: Updated gradio version. Minor changes to redactor function sequence. Minor formatting and wording changes.