0

I am trying to implement secure file uploads. I need to support various file types, including PDF, XLS, and XSL. I have implemented some basic controls, such as:

  • Store files outside the web root
  • Check file extension against whitelist
  • Generate a new file name on the server

However, I am struggling to validate the file type itself. My thought is that I should not just rely on file extension, because someone might upload definitely-not-an-executable.pdf.

Question

Is it necessary to check the file type beyond just the extension? If so, what is the best way to do that? Mime type? File signature? Something else?

srk
  • 109
  • 3
  • 1
    @Marcel - I agree #2 is better suited on SO, but I thought #1 would be a good fit for Information Security, because I'm asking a general question about how to address a security concern. Would it help if I rewrote this question to focus on #1 without the implementation details? – srk Mar 22 '22 at 16:40
  • @srk you should check for magic file bytes or so called magic headers (also called file signatures) https://en.wikipedia.org/wiki/List_of_file_signatures – Sir Muffington Mar 22 '22 at 20:33
  • Yes, #1 is a good fit here IMHO. I suggest you remove #2 here, and ask that specifically on StackOverflow. It's then possible to reference to this question from there. – Marcel Mar 22 '22 at 21:19
  • @SirMuffington what about non-binary files (txt, xml, json, xsl, etc.)? My understanding is that they won't contain a file signature (depending on the file signature). So is it sufficient to only check file signature for binary files, otherwise no check? – srk Mar 23 '22 at 12:18
  • @Marcel - I rewrote the question to remove implementation-specific details. Hopefully that helps. – srk Mar 23 '22 at 12:41
  • 1
    What is your threat model? Who is using the service and what are the risks? Do you have to protect the end users against themselves? Is there legal requirements against hosting malware, even if you are not aware of them? – A. Hersean Mar 23 '22 at 12:47
  • @srk you can also parse non-binary files. This of course increases your attack surface, since parsers can have security vulnerabilities.. – Sir Muffington Mar 23 '22 at 15:49
  • @A.Hersean - without getting into two many details, I'll say two things. (1) The site handles somewhat sensitive data...not just like a meme forum, for example. (2) Users can view/download files that were uploaded by other users, so I need to protect users against files that another user uploaded. – srk Mar 24 '22 at 12:34
  • @SirMuffington - what exactly do you mean by "parse non-binary files"? Do you mean that I can, for example, parse an XML document to ensure it is valid XML? – srk Mar 24 '22 at 12:35
  • @srk Would you be legally liable if someone download a malware uploaded by someone else on your platform, if you did not know about it being a malware? In most legal systems I know of, hosts that do not edit content cannot legally be held responsible for stuff uploaded by users of their platform. – A. Hersean Mar 24 '22 at 12:37
  • If you want to protect against malware, then you should not rely on file types. I could hide a virus in a PDF for example. Rely on antivirus, that's their job. – A. Hersean Mar 24 '22 at 12:39
  • @srk yes, that's one example. – Sir Muffington Mar 24 '22 at 18:24

0 Answers0