18

I don't want it to just check the extension of the file as these can easily be forged even MIME types can be forged using tools like TamperData.

So is there a better way to check file types in PHP ?

inf3rno
  • 487
  • 1
  • 7
  • 19
Grim Reaper
  • 518
  • 1
  • 4
  • 14
  • 11
    Something to keep in mind: files don't really _have_ types. That is, the information you know as the "file type" is not actually part of the file. The file type that you see displayed in your file manager, or returned by the appropriate PHP function, or so on, is simply a heuristic guess at the intended use of the file, based on the file's name and possibly its content. What really matters is what you do with the file, i.e. what program(s) you use to read or process it. – David Z May 13 '14 at 16:11
  • There is a "good" answer to this already somewhere on the Stack Exchange network (was it _this_ site?). But basically there is no full proof way. You need to secure your system, rename the file, stream it, etc. so that when "opened" it won't compromise your system. – MrWhite May 13 '14 at 20:41

6 Answers6

23

You want PHP's Fileinfo functions, which are the PHP moral equivalent of the Unix 'file' command.

Be aware that typing a file is a murky area at best. Aim for whitelist ("this small set of types is okay") instead of blacklist ("no exes, no dlls, no ..."). Do not depend on file typing as your sole defense against malicious files.

gowenfawr
  • 71,975
  • 17
  • 161
  • 198
13

Files have signatures or "magic numbers" embedded in them, usually near the beginning of the file. libmagic is a library which extracts a files signature and looks it up in a signature database.

This is the way Unix type systems determine file types i.e. if you save a text file without an extension on Linux it will still automatically open with a text editor.

Systems like Windows on the other hand only look at file extension. Opening a text file with no extension on Windows will result in a WTf-is-this popup window.

So there are merits in checking both the extension and the magic number since your website will likely have visitors with different operation systems.

user2675345
  • 1,651
  • 9
  • 10
  • 5
    True, although I would add that not all files have magic numbers. In particular, Linux-like systems tend to recognize a text file by its _lack_ of a known magic number. And depending on which program is doing the interpretation, a known file extension could override the type determination from the magic number. (which is occasionally desirable) I would say only that Linux file managers base their determination of the file type on the file's content and its name, whereas Windows Explorer bases it solely on the extension. – David Z May 13 '14 at 16:06
4

There is no conception of file type. In computer world everything is a bunch of 0/1 and whether it is and image or a lot of random characters depends on how do you interpret your zeros and ones. File type (as an extension like .docx, .png) are just for the convenience of the user to be able to do an educated guess of what can it be and to open it with a proper tool. As with any guess, it can be wrong.

So instead of trying to play around with techniques like suggested fileinfo, if I were you, I would rather figure out what do I allow people to upload.

So if you allow people to upload images, use getimagesize and may be even check that the width height is in appropriate range (who knows may be someone will upload an image like 500.000 pixels width/height and your server will die while resizing it. Its a valid image, but still not what you want). May it make sense to resize every image and only serve resized formats and store somewhere untouchable originals.

If you decide that users can upload .mp3 files, take a look at something that can deal with these sort of files. Who knows may be there are already tested methods to check whether this is really mp3 file.

Regarding of what you decide, use something to mitigate possible problems (assuming that the person upload a file $file = $_FILES['file']):

  • check for errors during upload if (!$file['name'] || $file['error']){ return false; }
  • check that this file really has the size accepted by you if ($file['size'] > MaxPossible || $file['size'] < MinPossible){ return false; }
  • rename the file (if I submit something like ../../../t.py.png, it will be renamed to uniquefilename.png)
  • it is saved with the least possible permissions. Surely without no permissions to be executed. ( may be 640 or 660 )
  • to be sure that there is no way to perform XSS, save and serve them from a separate domain.
Salvador Dali
  • 1,745
  • 1
  • 19
  • 32
2

The $_FILES contains mime types as well, you can check that.

You can parse the files with a specific parser which throws an exception when the file is not really what it waits... Anything else can be falsified I think.
For example you can use GD or Imagick by image files, a JSON parser by json files, DOM and XML parser (with turned off external entities) by HTML and XML files, etc... By Imagick you can use the identify tool as well. I think there are other tools for other file types.

What really matters by file upload is

  • preventing execution (Use chmod() to change file attributes, and/or move them to a static subdomain.),
  • file inclusion (Never include an uploaded file by serving the clients, use file reading methods like file_get_contents(), or use the X-Sendfile header without HTTP header injection vulnerability, if you want to have access control on the file. If not, then let the HTTP server do its job.),
  • eval injection (Never use exif data in an eval context, for example with preg_replace().),
  • content sniffing (Force download with Content-Disposition header without HTTP header injection vulnerability, or by inclusion use the following headers: Strict-Transport-Security, X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, Content-Security-Policy.)
  • xss (The same as by content sniffing. Try to avoid client side file inclusion, if not necessary, and use the proper headers.)

and so on...

I wrote an even more detailed answer on stackoverflow about PHP uploads, maybe it helps.

inf3rno
  • 487
  • 1
  • 7
  • 19
2

Salvador Dali has some very good suggestions with regards to images. 1 thing however that he is missing. It is possible for an image to show as perfectly valid however contain malicious code. This for example can be placed after the end of image marker (0xFF, 0xD9). 1 potential way to get around this is to re sample the file using something like GD. It used to be quite common for avatar and signature uploads to be taken advantage of in forums. Someone would upload their image which will display as normal but will also contain code that could infect the users PC with malware.

The same is probably also true for MP3's and other file types.

Peter
  • 196
  • 1
  • 6
0

There is no conception of file type.

Because every file could be of a dozen different formats at once.

Well, at least two is always plausible. Say, a csv file could be a PHP file as well

462331,"Sneakers",39.00,"<?php eval($_GET['e']); ?>","in stock"

Or an image file could contain any extra information that could be retained even if you recreate the image.

So in your place I wouldn't dismiss the file extension so easily, as it will be the the extension which will tell your web-server how a file should be executed.

Whereas whatever file type could be indeed easily faked.