Skip to content

Releases: seanpedrick-case/doc_redaction

v0.7.2

16 Jul 11:31
f2f92b5
Compare
Choose a tag to compare

Corrected logging. Text redaction can now produce ocr_outputs_with_words objects. Duplicate detection by line now possible (rather than by page)

e424038: Updated packages. Corrected CSV logger headings, can now submit custom log csv names to S3. Started work on identifying and deduplicating at the line level

c8ffcd4: Further updates to line level duplicate identification

ef4000e: Local text redaction now produces ocr results with words json and can make dataframe format

52e26c1: Updated save options for ocr_outputs_with_words

v0.7.1

29 Jun 15:10
d88286c
Compare
Choose a tag to compare

9f51e70: Updated CDK code for custom KMS keys, new VPCs. Minor package updates.

v0.7.0

18 Jun 12:49
10da194
Compare
Choose a tag to compare

Revamped duplicate page/subdocument removal, CDK code, updated documentation, read-only file system compatability:

a7566b9: Adapted Dockerfile for systems with read only file system. Minor package updates.

36574ae: Added folder with CDK code and app. Updated config.py file to be compatible with all temp folders needed for read only file systems

c3d1c4c: Added source files for quarto documentation website

ab04c92: Updated duplicate pages functionality. Improve redaction efficiency a little with pd.concat method. Minor modification to documentation and interface

f47b137: Updated duplicate pages interface to include subdocuments and review. Updated relevant user guide. Minor package updates

3946be6: Minor update to ensure whole page redactions are applied correctly to a document with existing redactions

v0.6.8

21 May 21:18
95ca426
Compare
Choose a tag to compare

You can now remove all redactions with the same text based on the selected row on the review page. More config options for model and entity type visibility.

c4e3724: Added capability to redact all redactions with the same text based on the selected row. Rearranged buttons on review page a little. Improved page navigation efficiency.

bce761b: Added possibility of changing model and entity types in config file

c28176d: Update version numbers and readme

v0.6.7

20 May 14:55
b7d2635
Compare
Choose a tag to compare

Can now export xfdf files to Adobe containing redacted texts.

a91f87b: Now xfdf Adobe exports can export redacted text that is searchable in Acrobat

3270701: Updated version numbers

v0.6.6

19 May 20:59
e06b754
Compare
Choose a tag to compare

Improved checks for documents with different mediabox and cropbox sizes. Minor package updates.

20b655f: Updated version numbers, gradio package version.

5fcccbe: Expanded checks for out of range page cropboxes

v0.6.5

07 May 21:37
c2a4864
Compare
Choose a tag to compare

PDF compression removed by default, some new config options, minor formatting and layout changes

3bbf593: Added config options for compressing output pdfs, returning output redacted pdfs at all, and for changing the length of time for showing previous Textract jobs

5a21738: Updated gradio version. Minor changes to redactor function sequence. Minor formatting and wording changes.

v0.6.4

06 May 21:29
03d0cfd
Compare
Choose a tag to compare

97097ff: More checks on OCR outputs in redaction functions

v0.6.3

06 May 18:08
23f892d
Compare
Choose a tag to compare

10f46e9: Corrected a couple of bugs. Now Textract whole document API call outputs will load also the input PDF into the app

v0.6.2

29 Apr 12:24
baabf97
Compare
Choose a tag to compare

DynamoDB logging format and example, minor text revisions

94e514b: Updated logging format for timestamps to be compatible with AWS. Added load_dynamo_logs.py example file.

69c2af9: Updated version numbers, minor text revision