PDF-Text-Redaction-Using-Google-Cloud

Google Cloud DLP does not support document redaction (PDF, DOC, TIFF etc...), We're supposed to redact only images using google cloud api's. In this python program we are going to redact documents using DLP API.

[You are always welcome to collaborate or suggest some changes, I'll be thankful]

Approach -

Convert every page of document into images using PyMuPDF(In Output Folder).
Output/<PDF_FILENAME>/Page0001.png
Output/<PDF_FILENAME>/Page0002.png
Output/<PDF_FILENAME>/Page0003.png
...
Redact every Page from that folder and generate output in Redacted Images folder.
Output/<PDF_FILENAME>/Redacted Images/Redacted-Page0001.png
Output/<PDF_FILENAME>/Redacted Images/Redacted-Page0002.png
Output/<PDF_FILENAME>/Redacted Images/Redacted-Page0003.png
...
Create PDF from all images in Redacted Images folder and store in base of Output.
Output/<PDF_FILENAME>/Redacted-

Requirements -

PyMuPDF
google.cloud.dlp
Project Credentials (You have to download json from Google Cloud Console)
Project Name (Name of Project on your Google Cloud Console)

Installation -

PyMuPDF
pip install PyMuPDF
Google Cloud DLP
pip install google.cloud.dlp

Executing The Program

python redaction.py <PDF_FILE_PATH>

That's It. Enjoy !!!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
input		input
output/Freelance Photography.pdf		output/Freelance Photography.pdf
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt
redaction.py		redaction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Text-Redaction-Using-Google-Cloud

About

Releases

Packages

Languages

License

meakshaymishra/PDF-Text-Redaction-Using-Google-Cloud

Folders and files

Latest commit

History

Repository files navigation

PDF-Text-Redaction-Using-Google-Cloud

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages