22

I have received several PDF documents via email from someone I do not trust. I need to read the documents and respond to them. They are not encrypted.

I want to make sure the documents are completely safe.

I scanned them with multiple antivirus products. No problems detected.

I am able to view them from within my webmail, but I want to download them and view them directly and keep them for my records.

I am concerned that if I open the files in a PDF reader/viewer/editor, the documents may have something in them that will try to connect to a server to send a 'ping' indicating that I have opened the document.

Is it possible for a PDF document to do that? If so, how do I determine if any of these documents are set to do that? Also, besides disconnecting from the internet, how do I prevent it?

stochastic
  • 636
  • 3
  • 7
Tim
  • 221
  • 2
  • 3
  • 12
    Run them in a VM on a throwaway airgapped rig. – Deer Hunter Dec 19 '14 at 04:23
  • You can set up a vm behind a firewall which dissalows traffic to the internet and logs this traffic. Try to look for some virtual appliance that provides this features via webbrowser configuration if you dont know how to configure linux manually to do this. Also you can set up a VM without any Internet connection and run netstat or tcpview (sysinternals) and monitor if opening the pdf will initiate any network connections. Keep an eye out tough for traffic that the pdf reader will initiate on its own even if you start it without any document... After that both ways need you to manually analyze – Sebastian B. Dec 19 '14 at 09:19
  • Oh and then again the simple solution to just use a host based firewall and drop all outgoing traffic for the PDF viewer. Windows Firewall for example will do this and also is able to log dropped connections. There is always a chance of misconfiguration so an offline VM will beless prone to human error. – Sebastian B. Dec 19 '14 at 09:28
  • You can convert PDF to DOCX either in local copy of MS Word 2013 or in Word Online via Onedrive (Skydrive). It is long process, often messes up formatting but it usually works just fine. – PTwr Dec 19 '14 at 13:07
  • 2
    You could use Sandboxie to protect your computer if the file is malicious. To prevent the PDF from phoning home, just disconnect from the internet. – DaveTheMinion Dec 19 '14 at 18:13
  • 1
    Just a note regarding phoning home. It is possible for the email to "phone home" without you opening the pdf using a tracking pixel if your mail client is displaying images. Here's the wiki page http://en.wikipedia.org/wiki/Web_bug – Kirill Rakhman Dec 19 '14 at 19:07
  • @DeerHunter Make sure to DBAN (120 passes at least) and shred the hard drive when done. – Kaz Wolfe Dec 20 '14 at 10:31

4 Answers4

22

Is it possible for PDF documents to dial home?

Yes it is, at least with Adobe reader products (see here):

In addition to visible links in a PDF document, form fields can contain hidden JavaScript calls that open a page in a browser or silently requests data from the Internet.

How can I tell if a particular pdf does this?

I'm sure there is a way to do this, but I do not know what it is. Possibly the adobe pdf reader would show you what features a particular document uses, but I don't use them so I don't really know. I would argue that the more important question is...

How do I protect myself from a PDF file calling home?

First, I should say that anti-virus scanning is NOT the way to do this. For one thing, "dialing home" is not a virus in a pdf, it's just use of a "legitimate" feature. For another thing, virus scanning is something of a broken security model: a new virus will make it past the scanners every time.

Fortunately, it is possible to disable a number of options such as web links and JavaScript using Reader's preferences (see here for instance, note that the Adobe term "links to the Internet" does not refer to usual hyperlinks like the web has, but actual connections). You probably want to permanently disable internet access and JavaScript: very seldom do you need these features, and they just expose potential problems.

Practically speaking, this is really all you need to do to keep yourself safe.

However...

There have also, over the years, been vulnerabilities discovered in various pdf viewers, making it possible for a specially crafted pdf to do nasty things like execute arbitrary code. There are ways to mitigate this as well: modern versions of adobe reader have a built-in sandbox that you can enable, see here.

Presuming you are running on a system that you keep patched, this is probably not a huge risk. If you need to make extra extra sure, using a virtual machine (like VMWare or VirtualBox). I would suggest this procedure:

  • Create a virtual machine and install pdf reader software on it
  • Set up a shared folder between the host machine and the virtual machine
  • Use this shared folder to copy the pdf files to the virtual machine.
  • Shut down the virtual machine, disable it's network and the shared folders
  • Take a snapshot of the virtual machine's hard disk state
  • re-start the virtual machine and view the pdf files. Since the network and shared files are disabled, there should be no way for anything nasty in the pdf files to get off of the virtual machine.
  • when you are finished, shut down the virtual machine, and roll back it's state to the checkpoint. Now anything bad that the pdfs might have done to our virtual machine is gone.

However...

All that being said, it might theoretically be possible for there to be flaws in the vitualization software allowing something extra extra nasty in the pdf file to escape. We were in the realm of extreme paranoia with the virtual machine, IMHO, and we're WAY over the edge now, but for the sake of completeness, an airgapped physical machine would be an even more secure option.

(As Deer Hunter's comment suggests, using physical hardware that is not connected to any network which you destroy afterwards would be even more secure, though we're getting exponentially more paranoid by the minute).

user10008
  • 4,315
  • 21
  • 33
stochastic
  • 636
  • 3
  • 7
  • 16
    Or, you know, use a PDF reader that doesn't support phone-home extensions. – Shadur Dec 19 '14 at 07:01
  • 3
    A PDF exploit that can escape an up to date virtualization software would be really rare and probably not used in "low interest hacks". You are right tough to point out the possibility however I personally wouldn't worry about it if this not about life and death or millions of dollars... – Sebastian B. Dec 19 '14 at 09:24
  • 1
    Sebastian B. - you are quite correct of course, hence my reference to paranoia, but for the sake of completeness... ;-) – stochastic Dec 19 '14 at 15:54
  • @shadur ftw. if you're using adobe reader for anything at all, you've got worse problems anyway. – user428517 Dec 19 '14 at 20:30
  • 2
    is OS X Preview.app safe to use for viewing PDFs? Or Mozilla Firefox? – Display Name Dec 20 '14 at 09:40
  • @Sarge Borsch: Mozilla firefox uses the [pdf.js](https://github.com/mozilla/pdf.js) project as it's embedded reader. I use this myself, and don't think it is capable of doing anything other than display, but I can't seem to find an authoritative source that definitively and clearly states this. Maybe I'll try asking on their GitHub page. As far as OSX Preview, [this page](http://khkonsulting.com/2013/06/preview-app-killer-of-pdf-files/) points out that it sometimes does undesirable things to the pdf file, but I've similarly had problems finding answers about security. – stochastic Dec 20 '14 at 19:02
  • @Shadur: there is NO "phone-home extension"; a phone-home mechanism is created using base actions of PDF. – Max Wyss Dec 20 '14 at 19:07
  • I have posted a documentation issue on the pdf.js page, for those who are interested. It is [here](https://github.com/mozilla/pdf.js/issues/5565) – stochastic Dec 20 '14 at 19:28
12

I don't have enough reputation to comment but I would like to add to dotancohen's answer. If you want to read a PDF in plain text pdftk is an amazing free tool.

Just run a command like:

pdftk input.pdf output out.pdf uncompress

and all compressed content streams will be uncompressed. The structure (such as object numbers) may change a little but this will enable simple parsing for known strings like '/JavaScript' with your favourite tools.

You may need to go through the PDF spec to see if that's enough but it should get you started. A ready made tool or firewalled VM is safer if you don't have the time or interest for this.

Nick P
  • 271
  • 1
  • 2
  • 6
4

For this page one can download PDF files with example scripts. I downloaded this one.

As an experiment I ran the file through strings and used grep to search for JavaScript:

$ strings JSPopupCalendar.pdf | grep -i java
<</JavaScript 251 0 R/EmbeddedFiles 243 0 R>>
<</S/JavaScript/JS 253 0 R>>
<</S/JavaScript/JS(\n\r\n       /* Set day 18 */\r\n    FormRouter_SetCurrentDate\("18"\);\r)>>
<</S/JavaScript/JS(\nFormRouter_PlaceCalendar\(this.getField\("FormDateField"\), false, "mmmm dd, yy"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 17 */\r\n    FormRouter_SetCurrentDate\("17"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 26 */\r\n    FormRouter_SetCurrentDate\("26"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 16 */\r\n    FormRouter_SetCurrentDate\("16"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 31 */\r\n    FormRouter_SetCurrentDate\("31"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 15 */\r\n    FormRouter_SetCurrentDate\("15"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 25 */\r\n    FormRouter_SetCurrentDate\("25"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 14 */\r\n    FormRouter_SetCurrentDate\("14"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /*  Set day 13 */\r\n   FormRouter_SetCurrentDate\("13"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 24 */\r\n    FormRouter_SetCurrentDate\("24"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 12 */\r\n    FormRouter_SetCurrentDate\("12"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 30 */\r\n    FormRouter_SetCurrentDate\("30"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 21 */\r\n    FormRouter_SetCurrentDate\("21"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 23 */\r\n    FormRouter_SetCurrentDate\("23"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 10 */\r\n    FormRouter_SetCurrentDate\("10"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 9 */\r\n     FormRouter_SetCurrentDate\("9"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 22 */\r\n    FormRouter_SetCurrentDate\("22"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 8 */\r\n     FormRouter_SetCurrentDate\("8"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 29 */\r\n    FormRouter_SetCurrentDate\("29"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 7 */\r\n     FormRouter_SetCurrentDate\("7"\);\r)>>
<</S/JavaScript/JS(\n/* Set day 1 */\r\nFormRouter_SetCurrentDate\("1"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 6 */\r\n     FormRouter_SetCurrentDate\("6"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 28 */\r\n    FormRouter_SetCurrentDate\("28"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 5 */\r\n     FormRouter_SetCurrentDate\("5"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 20 */\r\n    FormRouter_SetCurrentDate\("20"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 4 */\r\n     FormRouter_SetCurrentDate\("4"\);\r)>>
<</S/JavaScript/JS(\n\r\n\r\n   /* Set day 3 */\r\n     FormRouter_SetCurrentDate\("3"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n\r\n   /* Set day 19 */\r\n    FormRouter_SetCurrentDate\("19"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n/* Set day 2 */\r\nFormRouter_SetCurrentDate\("2"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 27 */\r\n    FormRouter_SetCurrentDate\("27"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 11 */\r\n    FormRouter_SetCurrentDate\("11"\);\r)>>
<</S/JavaScript/JS(\nFormRouter_PlaceCalendar\(this.getField\("DateTest2"\), true, "ddd mmm d, yyyy"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 23 */\r\n    FormRouter_SetCurrentDate\("23"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 24 */\r\n    FormRouter_SetCurrentDate\("24"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 25 */\r\n    FormRouter_SetCurrentDate\("25"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 26 */\r\n    FormRouter_SetCurrentDate\("26"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 27 */\r\n    FormRouter_SetCurrentDate\("27"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 28 */\r\n    FormRouter_SetCurrentDate\("28"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 29 */\r\n    FormRouter_SetCurrentDate\("29"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 30 */\r\n    FormRouter_SetCurrentDate\("30"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 31 */\r\n    FormRouter_SetCurrentDate\("31"\);\r)>>
<</S/JavaScript/JS(\n/* Set day 1 */\r\nFormRouter_SetCurrentDate\("1"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\nFormRouter_PlaceCalendar\(this.getField\("FormDateField.1"\), false, "mmm d, yyyy"\);\r\n\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\nFormRouter_PlaceCalendar\(this.getField\("DateTest1"\), false, "mm/dd/yyyy"\);\r\n\r\n\r\n\r\n\r)>>
<</S/JavaScript/JS(\n/* Set day 2 */\r\nFormRouter_SetCurrentDate\("2"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n\r\n   /* Set day 3 */\r\n     FormRouter_SetCurrentDate\("3"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 4 */\r\n     FormRouter_SetCurrentDate\("4"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 5 */\r\n     FormRouter_SetCurrentDate\("5"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 6 */\r\n     FormRouter_SetCurrentDate\("6"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 7 */\r\n     FormRouter_SetCurrentDate\("7"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 8 */\r\n     FormRouter_SetCurrentDate\("8"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 9 */\r\n     FormRouter_SetCurrentDate\("9"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 10 */\r\n    FormRouter_SetCurrentDate\("10"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 11 */\r\n    FormRouter_SetCurrentDate\("11"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 12 */\r\n    FormRouter_SetCurrentDate\("12"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /*  Set day 13 */\r\n   FormRouter_SetCurrentDate\("13"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 14 */\r\n    FormRouter_SetCurrentDate\("14"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 15 */\r\n    FormRouter_SetCurrentDate\("15"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 16 */\r\n    FormRouter_SetCurrentDate\("16"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 17 */\r\n    FormRouter_SetCurrentDate\("17"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 18 */\r\n    FormRouter_SetCurrentDate\("18"\);\r)>>
<</S/JavaScript/JS(\n\r\n\r\n   /* Set day 19 */\r\n    FormRouter_SetCurrentDate\("19"\);\r\n\r\n\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 20 */\r\n    FormRouter_SetCurrentDate\("20"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 21 */\r\n    FormRouter_SetCurrentDate\("21"\);\r)>>
<</S/JavaScript/JS(\n\r\n       /* Set day 22 */\r\n    FormRouter_SetCurrentDate\("22"\);\r)>>
<</S/JavaScript/JS(\n\r\n\r)>>
<</S/JavaScript/JS 233 0 R>>
<</S/JavaScript/JS(\n\r\nif\(!event.willCommit\)\r\n{\r\n  FormRouter_SetDays\(parseInt\(event.changeEx\), parseInt\(getField\("FR_00000_Calendar.CalendarYear"\).value\)\);\r\n}\r\n\r\n\r\n\r\n\r\n\r)>>
               <rdf:li>JavaScript</rdf:li>

I cannot ensure that all PDF files with Javascript will have the Javascript visible with strings. However, checking this way would be a good first step.

dotancohen
  • 3,698
  • 3
  • 24
  • 34
  • Yikes, this document is a form or an otherwise smart PDF (otherwise, it would not make much sense to include JavaScript). What you have revealed is the code for the Date Picker by FormRouter, which has been integrated into this particular document by its author(s). There may be other code, but that too has been inserted by the author(s) (…and, BTW, the source is a most reputable company). Anyways, it might be better to search for /JS, and use a PDF explorer, as it is, for example, part of the Preflight tools in Acrobat Pro. You may also look out for Action related keys (see ISO 32000). – Max Wyss Dec 20 '14 at 19:05
0

As it has been stated, it is possible to make a PDF document call home.

This is normally done by launching an URL, mostly when the document opens.

Probably the easiest, most pragmatic approach would be using a very dumb PDF viewer (which does not know about Actions etc.), and disconnect the machine from the network before starting the viewer (and reconnect after quitting).

However, you want to keep the PDF for your records. In this case, you'd have to "sanitize" it, and for that you would need something better than a dumb PDF viewer.

One possibility would be using Acrobat (while the machine is disconnected from the network), locate the PDFOptimizer (in Acrobat XI, you get at it with menu File --> Save as other… --> Optimized PDF). In that dialog, you have quite a few options to remove active elements etc. This is said to be sufficiently reliable, and that should do it.

If you have a bigger number of files to process, you might look at some products by Appligent; if I remember correctly, there is one utility which does properly strip the PDF from any active element (and, as it is not a PDF viewer, the document has no chance to call home).

Max Wyss
  • 207
  • 1
  • 3