3

We develop a web app that will start accepting pdf uploads, which will be stored and distributed to other users. Pdfs seem remarkably unsafe still. How can I scan and reject any pdf that does anything beyond representing a printable document? No forms, no javascript, no embedded executables, no shell commands. Are there tools?

bbsimonbb
  • 949
  • 7
  • 12
  • Given that Postscript is a full-on programming language, and that you can't predict what any given PDF generator is going to put into the PDF files, the best you may be able to do is run a virus-scanner and hope. – Michael Kohne Jul 24 '18 at 11:10
  • 3
    IMHO, Postscript is not the issue - its been a very long time since there were any vulnerabilities found there and it runs in a well defined sandbox - OTOH, a PDF can contain javascript (usual source of vulnerabilities) jpeg images (several vulns in Microsoft) and EPS (a couple of vulns in Microsoft) and flash (oh dear) – symcbean Jul 24 '18 at 11:26
  • 2
    @bbsimonbb: I've not flagged this as a duplicate (since you explicit ask how to validate files rather than how to make them safe) - but this may be relevant: https://security.stackexchange.com/questions/103323/effectiveness-of-flattening-a-pdf-to-remove-malware – symcbean Jul 24 '18 at 11:30
  • @symcbean that's a very useful link thanks. I thought it would be easier to screen and reject, rather than flatten, but I'm not finding the main road from here to there. – bbsimonbb Jul 24 '18 at 12:38
  • Didier Sevens PDF tools ( https://blog.didierstevens.com/programs/pdf-tools/ )are useful for pulling apart PDF files which should allow you identify javascript and flash nasties - but you would probably consider jpeg images as valid. Unfortunately Microsoft has a history of problems with these files. – symcbean Jul 24 '18 at 12:46
  • @symcbean again very useful thanks. We already accept jpeg uploads, so we'll be no worse off accepting pdfs with jpegs in. – bbsimonbb Jul 24 '18 at 14:49
  • Do you need the PDF content as a PDF? You could convert into any other format using a tool like imagemagick. Run this in an isolated system (throwaway container or serverless) and you don’t need to worry so much. – David Jun 05 '19 at 22:22

1 Answers1

1

Validate it's really a pdf (eg it's not an EXE or JS etc). Do this via multiple methods such as file extension but also using magic text.

Antivirus solutions have command line or codeable scan checkers so AV scan the pdf before deciding to permanently store it on your system.

Train the users opening them on how to spot a phishing pdf and how to avoid clicking on security pop ups such as JavaScript execution or loading external resources.

You can use tools like Didier Stevens ones mentioned above to count number of JavaScript sections and if the pdf has any Javascript reject it on the spot and don't save it.

Do all your validation server side as client side such as JavaScript validation can always be bypassed.

Log everything so you know who uploaded a pdf and who viewed it

Have a process in place to quickly remove a Malicious process from the system if one is found