Background
My boss asked me to come up with a way ordinary users can redact information from PDF files using free software. We get a lot of scanned documents and our client requires that sensitive information be redacted from PDFs before they are uploaded to their system. Here's what I came up with. I have convinced myself that this will effectively destroy potentially sensitive client metadata from the original document, as well as making it impossible to remove any black bars covering up sensitive information. However, I have also found that I don't know nearly as much as I think I do.
Many forum members posting about this topic have stated quite firmly that only Adobe Acrobat or other paid software can do this securely. If you are of this opinion, please explain why. I'm having trouble figuring out why this wouldn't work.
Overview
In some PDF program, cover up the sensitive stuff with boxes, then convert it to a TIFF file. Then convert the TIFF file back to a PDF.
- Would this work? Does the TIFF file preserve any information about objects or layers? Is any potentially sensitive metadata likely to make it through, or will all metadata be changed, as I hope?
How I'm doing it specifically
I don't know if I should include this, since the general question will probably be more useful, but here's my specific setup:
The software:
PDFCreator and Foxit PDF.
The setup:
Change the settings in PDFCreator so that it converts the document to a TIFF, instead of a PDF. For the output, set PDFCreator to print back to FoxIt, rather than opening the document.
The process:
- Open the PDF in Foxit Reader and cover up any visible sensitive data with black rectangles.
- Print the document to PDFCreator.
- In the background, PDFCreator saves the file as a TIFF and then prints the TIFF to Foxit's PDF printer. Foxit asks where you want to save the PDF.
Related
Inspired by Blacking out a part of a PDF, or redaction of text on AskDifferent.
This is related to How to remove meta and sensitive data from PDF file?, but we are all on Windows, not Unix.
Also related from SuperUser: How to remove OCR from a PDF?
Step by step instructions for a similar process by someone else: Quick and Dirty Redaction
Summary
From a security standpoint, will converting a PDF to an image, blacking out a portion, then converting it back to a PDF be sufficient in removing information from the document?