Batch remove comments from PDF files

18

8

How can I easily remove all comments and annotations (added with Foxit Reader) from all the PDFs in a folder?

Andrew

Posted 2010-12-13T23:15:46.900

Reputation: 232

1is it a valid assumption to think you want only suggestions for free (as in beer) or for Free (as in liberty) solutions? – Kurt Pfeifle – 2010-12-19T11:31:28.927

@pipitas I am interested in any kind of solution. – Andrew – 2010-12-22T16:26:54.383

Answers

7

I just came across this problem, and none of the answers given here worked for me. What did work was the rewritepdf tool from the Ubuntu package libcam-pdf-perl:

rewritepdf -C in.pdf out.pdf

Wrapping this into a little scripting to remove annotations from all pdf files in a directory is now easy:

for i in *.pdf; do rewritepdf -C '$i' '$i'.new; done

As usual, you can install libcam-pdf-perl via the Software Center or using sudo apt install libcam-pdf-perl

Uli Fahrenberg

Posted 2010-12-13T23:15:46.900

Reputation: 171

It worked fine.:) Some help: The Ubuntu (i.e., debian) package is here https://packages.debian.org/sid/perl/libcam-pdf-perl Dependences are automatically installed using the "Ubuntu software center". (Oh, and watch out with the capital "-C". I first ran "-c" and nothing happened, even no error was output.)

– loved.by.Jesus – 2016-09-21T13:34:51.117

5

Providing you're on a Unix system:

cd <directory containing PDFs>
find . -type f -name '*.pdf' -exec perl -pi -e 's:/Annots \[[^]]+\]::g' {} +

This is a hack that removes all /Annots commands from the PDF (the commands that draws the annotations). It leaves the annotation objects there (you can open the PDF with a text editor and search for them), they're just not drawn.

Divinenephron

Posted 2010-12-13T23:15:46.900

Reputation: 151

This also removes internal document links (as, I believe, those are implemented in a pdf as /Annots, too) – Alec Jacobson – 2019-08-29T12:41:09.957

Can you explain the RE? What does [^]]+ match – jftuga – 2012-05-25T12:03:12.557

1@jftuga, s: (substitute) /Annots \[ (the text "/Annots [") [^]]+ (one or more instances of any character besides "]") \] (the literal character "]") :: (replace anything matching the former with nothing) g (replace multiple times per line if necessary). – Divinenephron – 2012-05-25T12:39:27.207

1One potentially confusing part of the regex is that a literal ] normally has to be escaped, but not right after a ^ negation. – Divinenephron – 2012-05-25T12:45:53.080

3

Haven't tested it a great deal, but the following seems to work. It deletes all annotations, except internal document links (which none of the answers here seem to do). This script depends on the pdfrw python library.

#!/usr/bin/python

import sys, pdfrw

try:
    in_path = sys.argv[1]
    out = sys.argv[2]
except:
    print("Usage:\tannotclean IN.pdf OUT.pdf")
    exit(0)

reader = pdfrw.PdfReader(in_path)

for p in reader.pages:
    if p.Annots:
        # See PDF reference, Sec. 12.5.6 for all annotation types
        p.Annots = [a for a in p.Annots if a.Subtype == "/Link"]

pdfrw.PdfWriter(out, trailer=reader).write()

Usage:

  1. Save as a script somewhere (I assume in your PATH), e.g. /usr/local/bin/annotclean.
  2. annotclean in.pdf cleaned.pdf
  3. (optional) batch processing:
# fish shell syntax
for p in **pdf # pdfs from current directory and subdirectories
    annotclean $p $p.new
    mv $p.new $p # overwrite the old
end 

rien333

Posted 2010-12-13T23:15:46.900

Reputation: 137

2

I think you can do that most easily by "refrying" the PDF. Refrying means: first convert the file to PostScript, then convert the PostScript back to PDF. Usually refrying is frowned upon, because usually you loose quality and some content. In your case you want to loose the content. The re-frying can be done with Ghostscript (and the helper batch files shipping with it -- download the gs900w32.exe if you are on Windows), so here you go, with 2 easy commands:

pdf2ps.bat input.pdf output.ps
ps2pdf.bat output.ps input_refried.pdf

Kurt Pfeifle

Posted 2010-12-13T23:15:46.900

Reputation: 10 024

This also removes internal document links. – Alec Jacobson – 2019-08-29T12:40:54.927

@AlecJacobson: Of course. If you convert to PostScript you loose a lot of the "rich" content that was part of PDF. PostScript does not have the means to represent ANY links, not even internal document links.... – Kurt Pfeifle – 2019-08-29T13:32:56.543

1This doesn't work. Written-in comments remain (not as comments, but as part of the pdf). – Andrew – 2010-12-22T16:33:00.337

2If the comments are actually added to the content of the PDF, they can only be removed manually. Actual PDF annotations are separate. – CarlF – 2010-12-22T18:30:17.117

Is there any way without using any converter? – user – 2011-04-23T17:45:21.443

2

OK, you said you'd also consider a commercial solution....

I'd recommend you try callas pdfToolbox. It's available for Windows and Mac OS X. (They have a CLI for Linux as well, but you can only use pre-configured "profiles" with it. With the Windows GUI, you can create your custom profiles and re-use them with the Linux CLI, though.

The pdfToolbox has lots and lots and lots of way to manipulate and fix many, many individual PDF problems.

One of the "Fixups" is to remove all annotations.

You don't need to shell out any money to test it first; callas gives out 14days trial licenses for free.

Kurt Pfeifle

Posted 2010-12-13T23:15:46.900

Reputation: 10 024

It indeed does have a way to to remove all annotations, but I'm not sure how to do a batch job. – Andrew – 2010-12-23T15:20:12.747

Dunno about previous versions --- but the latest pdfToolbox5 release allows to run it in batch mode against complete folders containing PDFs..... – Kurt Pfeifle – 2011-04-27T11:30:59.967