How to remove a watermark from a PDF file?

42

46

I thought this would be a simple task, but it turned out the other way.

The watermark is the very same (overlapping, but transparent) image on every single page. I created the PDF file myself (so no copyright worries here) using PDFCreator 0.9.8.

I have already tried my friend's Adobe Acrobat Pro, but it didn't work. It tries to remove it, but it can't. I tried to remove header/footer, etc., but the watermark just won't disappear.

How can I remove the watermark?

Apache

Posted 2012-07-30T18:01:56.007

Reputation: 14 755

2PDF is an output format, like an electronic printed page. It isn't meant to be edited, and in most cases you won't be able to do what you're asking short of exporting the pages to images and photoshopping out the watermarks. – mk12 – 2012-07-30T18:06:38.790

Shopping recommendations are off topic for all stack exchange websites. To prevent this question from closing, I would recommend changing it to a how question, instead of what one – Canadian Luke – 2012-07-30T22:47:21.530

3It seems you would simply use PDFCreator 0.9.8 and set the option so a watermark is NOT added to each page. I assume this question is because you don't have the original source. – Ramhound – 2012-07-31T16:51:45.310

Answers

74

For image-based watermarks, there are several tools that promise their automatic removal. For example:

All of these are free to try, but require a license to actually produce the desired output.

However, the watermark of this specific PDF file (which the OP sent me via email) isn't a single image that is repeated on all pages. As it turns out, PDFCreator hardcoded it (almost pixel by pixel) into every single one of them. This makes the watermark much more difficult to remove (and results in a rather bloated PDF file).

Since the watermark is actually composed of many tiny images, you can remove them with a PDF editor (e.g., Foxit Advanced PDF Editor), simply by selecting them and pressing Delete. Unfortunately, you have to repeat this for every page.

A less time-consuming solution would be to remove the watermark programmatically. We need:

Steps

  1. Download Pdftk and extract pdftk.exe and libiconv2.dll to %windir%\System32, a directory in the path or any other location of your choice.

  2. Download and install Notepad++.

  3. PDF streams are usually compressed using the DEFLATE algorithm. This saves space, but it makes the PDF's source illegible.

    The command

    pdftk original.pdf output uncompressed.pdf uncompress
    

    uncompresses all streams, so they can be modified by a text editor.

  4. Open uncompressed.pdf with Notepad++ to reveal the structure of the watermark.

    In this specific case, every page begins with the block

    q 9 0 0 9 2997 4118.67 cm
    BI
    /CS/RGB
    /W 1
    /H 1
    /BPC 8
    ID Ÿ®¼
    EI Q
    

    and nearly 4,000 blocks just like this one. This particular block sets only one (/W 1 /H 1) of the watermark's pixels.

    Scrolling down until the pattern changes reveals that the watermark's stream is 95,906 bytes long (counting newlines). The exact same stream is repeated on every page of the PDF file.

  5. Press Ctrl + H and set the following:

    Find:               q 9 0 0 9 2997 4118\.67 cm.{95881}
    Replace:            (blank)
    Match case:         checked
    Wrap around:        checked
    Regular expression: selected
    . matches newline:  checked
    

    The regular expression q 9 0 0 9 2997 4118\.67 cm.{95881} matches the first line of the above block (q 9 0 0 9 2997 4118.67 cm) and all following 95,881 characters, i.e., the watermark's stream.

    Clicking Replace All removes it from all pages of the PDF file.

  6. The watermark has now been removed, but the PDF file has errors (the streams' lengths are incorrect) and it's uncompressed.

    The command

    pdftk uncompressed.pdf output nowatermark.pdf compress
    

    takes care of both.

  7. uncompressed.pdf is no longer needed. You can delete it.

The result is the same PDF without the watermark (and about half the size).

Dennis

Posted 2012-07-30T18:01:56.007

Reputation: 42 934

4Another trick that I found useful: It was difficult for me to figure out the block corresponding to the watermark in my PDF. So what I did was to just extract a single page from the PDF, ideally a page where there is just the watermark and not much else. From this one page alone, it should be easier to figure out the block that corresponds to the watermark. Then go back and do it for the original PDF. – Kenny LJ – 2015-07-11T17:14:54.717

1Wow, this is the first place on the internet I have found a good way to manage this. Any places that you recommend to read up on the container format? – ConstantineK – 2015-09-08T18:50:43.090

2

@hobs IIRC, I read parts of the official PDF reference to write this answer.

– Dennis – 2015-09-08T19:21:46.953

1Thanks @Dennis, I already gave you an upvote, but this seems to be the best canonical source. I was able to get what I needed done by just some find/replace and a few additional compression trial and error runs. HUGE HELP! – ConstantineK – 2015-09-08T19:48:55.983

7Instead of pdftk you can also use qpdf to uncompress and compress the pdf files. Commands: qpdf --stream-data=uncompress original.pdf uncompressed.pdf and qpdf --stream-data=compress uncompressed.pdf nowatermark.pdf – David Schuler – 2016-02-19T13:10:03.757

Excellent description of how to solve this problem. I had a similar watermark that was done differently, but this process write up made it possible to go after it. For me I was able to find similar enough blocks to get a decent regex that could catch most of them since they were variable length. – Byron Wall – 2016-06-27T18:12:46.397

Lot of products are putting lot of effort to embed the watermark data into the actual payload, making all of the described software products and techniques powerless. Any ideas to defeat those? – Jari Turkia – 2018-08-08T12:06:43.917

@JariTurkia Assuming it's encoded in a single block, the technique in my answer should work. Otherwise, you may be out of luck. – Dennis – 2018-08-08T15:20:12.043

@Dennis The product I failed with is XMind: Zen. Feel free to give it a go yourself. – Jari Turkia – 2018-08-09T10:15:23.553

For people who come to this question in the future (including myself): pdftk seemed to freeze on at least one PDF file; qpdf worked without any problems. qpdf has some preqs that need to be satisfied on a Mac, but can be installed using homebrew. – Dan Hicks – 2018-09-29T22:07:46.670

You will get a java.lang.OutOfMemoryError: Java heap space for big PDF with pdftk. A 40MB source pdf will transform into a 4.6GB uncompressed PDF. But no problem with qpdf. – noraj – 2019-11-04T21:26:12.100

6

It sounds like the watermark is actually part of the images within the .PDF, and not a separate image rendered over it by whatever you are using to display the .PDF. You may not be able to remove the watermark without extracting the images from the .PDF, running them through an image editor, and then reconstructing the .PDF manually.

LawrenceC

Posted 2012-07-30T18:01:56.007

Reputation: 63 487

4

For text watermarks, editing a PostScript version can be much easier: After

$ pdftops document.pdf

edit document.ps, then convert back to PDF via

$ ps2pdf document.ps

heiner

Posted 2012-07-30T18:01:56.007

Reputation: 141

On Linux, beware that pdftops and pdf2ps are different. Use the first command, not the second. – Camille Goudeseune – 2018-08-08T20:15:57.310

1If you know what the watermark text is, here's a one-liner. pdftops in.pdf - | sed 's/WATERMARK//' | ps2pdf - out.pdf – Camille Goudeseune – 2018-08-08T20:26:46.097

1

Found another way to do it:

  1. Use pdf2htmlEX tool (or any other PDF to HTML converter) to convert the PDF to a HTML file.
  2. Edit HTML with a text editor, and remove the watermark. Save it.
  3. Print to the HTML to a new PDF document
  4. Profit

Dominik Antal

Posted 2012-07-30T18:01:56.007

Reputation: 111

Thank you. Are you sure you could not remove the watermark with Adobe Acrobat this way? (This one might be indeed a cheaper solution.) – Apache – 2017-03-21T14:49:42.353

I believe you need some kind of password in order to remove the watermark within Adobe reader, hence I used this method. – Dominik Antal – 2017-05-03T00:39:06.920

1

The artifacts of the stamp are that you can delete it within Adobe Acrobat Pro, however it regenerates on a mouse-move because the stream object keeps it persistent.

If you try to edit the pdf source - which is tricky, there's a chance that the file will be corrupted.

If the stamp is a stream, we can interrupt it by disconnecting the computer from the Net, which I did.

Then using the Adobe Acrobat Pro, I selected one of my annotations, right-clicked to get the popup, and selected "Show Comments List".

Select the nefarious watermark/stamp from the List, right-click to get the popup and select "Delete". Do this on every page where the affixation occurs.

Save the File under another name. My application crashed, but not before saving the file!

Open the new & much smaller file; note that all the watermarks/stamps are gonzo.

In my case, the file size of my 3-page document shrank from 300 kb down to an impressive 60 kb. All the original data and annotations remained intact - sans the watermarks.

~Good hunting :o)

Alan Hord

Posted 2012-07-30T18:01:56.007

Reputation: 11

1

convert the document into .rtf file using zamzar. The water mark vanishes automatically after conversion. Please Note:- It works perfectly if the document contains text material. It has always been of great help.. (Mac user)

Shifa

Posted 2012-07-30T18:01:56.007

Reputation: 11

This does not work for the PDF I tried. – Kenny LJ – 2015-07-11T15:05:18.017

0

This is a supplement to @Dennis' answer of 18:06 30 Jul 2012. He certainly addresses the harder case.

In the simplest case where the watermark is simple, unadorned text, for example

Smedley For Commissioner

the uncompressed PDF watermarks might be defined like this:

    BT
    75.96 625 Td
    (Smedley For Commissioner)Tj
    ET

where 75.96 is the horizontal offset and 625 the vertical offset for this particular watermark instance. (Yes, both real numbers and integers may be seen.)

A regexp like the following will work for all such watermarks, ignoring any variations in their placement:

^BT\n[0-9.]+ [0-9.]+ Td\n\(Smedley For Commissioner\)Tj\nET\n

Be aware, tho, that a variety of modifying PDF operators can come into play with watermarks that have more complicated formatting. Such fanciness can transform what the reader expects (hopes?) to be a contiguous, easily-searched-for string into a mess of alphabet soup. For example,

E1 = mc² by Smedley™

Might be the product of this:

    BT
    75.96 625 Td
    (E)Tj
    -5 Ts
    (1)Tj
    0 Ts
    ( = mc)Tj
    5 Ts
    (2)Tj
    0 Ts
    (by Smedley)Tj
    5 Ts
    (TM)Tj
    0 Ts
    ET

. . . or far worse if your watermark is color-enhanced!

Having noted all this, I will also noted that PDFtk has a GUI version that purports to handle watermarks, in consideration of a $4 licensing fee. Not pricey at all!

On the other hand, I find its website currently advertises full support for O/S's through Windows 8 and OS X 10.8 Mountain Lion. That vintage is over 4 years ago. Might PDFtk be getting outdated? I suspect not, but I don't know.

Der Schley

Posted 2012-07-30T18:01:56.007

Reputation: 101

Thank you for the PDFtk recommendation, looks neat, but yeah also "smells" a bit oudated. The free version is 2.02, and Wikipedia says that was indeed released 3 years ago: https://en.wikipedia.org/wiki/PDFtk

– Apache – 2016-10-17T21:40:43.483

1@Shiki - actually, I did some serious research into PDF formats, but scrapped my detailed assessment of PDF progress in the past 10y. Here's the summary: "Shortfalls of a 3 y.o. PDFtk release in light of PDF spec development." While there have been several updates since, say, 2005, you'll be fine with _PDFtk in all likelihood._ I say this after checking some commonly available, modern PDF doc sources. All the PDF docs I checked were created to pre-2010 PDF standards (well before, actually.) If you have any doubt, check the top few dozen bytes of the PDF file in question. – Der Schley – 2016-11-12T20:49:48.593