9

A common vulnerability is for web applications to accept a filesystem path as a request parameter, and then perform some action on the specified path. For example, retrieving a file and returning it to the user, or perhaps even writing/deleting a file. This can allow malicious users to access files they're not supposed to be able to access, such as source code, password files, etc.

This can be a problem even if you prepend a base path, because an attacker can use directory traversal sequences like ../ (*nix) and ..\ (Windows) to traverse up out of the base path and into restricted directories and files.

The general idea then, is to either outright ban any such traversal sequences, or to "normalize" paths containing these sequences and then check that they still fall within the base path.

From what I understand, trying to blacklist traversal sequences in the request parameter is difficult, because there are alternate URL and Unicode encodings (e.g. %2e%2e%2f, and many, many others) that may end up being interpreted by the web server as ../ or ..\.

However, are alternate encodings still a problem if we normalize the path after it has been received by the web server (e.g., Apache) and passed to my application? For example, using these normalization rules?

My reasoning is that, whether the web server receives %2e%2e%2f and decodes it to ../, or the web server directly receives ../, the normalization algorithm will see ../ and resolve/filter it before checking against the base directory. Even if an attacker uses a double-encoding like %252e%252e%252f, it will be decoded as %2e%2e%2f, which will be treated literally by the OS/file system.

However, in this question, it suggests that:

However, if a web server is serving files and decoding the unicode is done after the check that prevents directory traversal or done slightly differently by the operating system, this attack may get past the filter allowing the attack to work.

So in other words, are there any known sequences of characters that could make it past the web server's decoding AND the aforementioned normalization routine (which handles ../ or ..\), that could be interpreted by the OS or lower-level file handling functions as directory traversal?

For example, in PHP:

$base = "/path/to/allowed/directory/";
$path = $_GET['path'];

$normalizedPath = normalize($path);   // normalize will remove or resolve ../ and ..\
$fullPath = $base . $normalizedPath;

echo file_get_contents($fullPath);
alexw
  • 1,289
  • 2
  • 9
  • 13
  • 1
    What is the code you are using for your normalize function. I can't really find vulns without seeing how you are doing it exactly. – Daisetsu Apr 25 '16 at 19:15
  • @Daisetsu I think this question assumes a perfect normalization of `../` and `..\ ` – Alexander O'Mara Apr 25 '16 at 19:16
  • The method can be found here: https://github.com/rockettheme/toolbox/blob/master/ResourceLocator/src/UniformResourceLocator.php#L217-L274. But yes, it appears to be perfect w.r.t. explicitly `../` and `..\ `. – alexw Apr 25 '16 at 19:36
  • One somewhat unrelated suggestion is to always check that the file type signature matches the file type you're expecting. If your script is serving images and the file type signed is for ASCII text, there's a proble. – Daisetsu Apr 25 '16 at 21:19
  • For sufficiently old versions of Windows, yes. But [`....` no longer means "up 3 directories"](https://devblogs.microsoft.com/oldnewthing/20160202-00/?p=92953) in modern versions. – Ben Voigt Apr 16 '19 at 03:49

2 Answers2

6

Utilizing Unicode, it's possible to encode \ and / into multi-byte characters. If the string comparison functions are not unicode-aware, there could be a bug which allows these characters through.

Wikipedia has a section on this in relation to an old attack on Windows servers:

When Microsoft added Unicode support to their Web server, a new way of encoding ../ was introduced into their code, causing their attempts at directory traversal prevention to be circumvented.

Multiple percent encodings, such as

  • %c1%1c
  • %c0%af

translated into / or \ characters.

Technically this is still using the slash character when it comes to the directory traversal, it's just not the true single-byte character which may confuse some code.

I believe the best advice for avoiding such characters would be to disallow any characters for file system paths except a safe subset of ASCII characters. You can also sidestep other issues of allowed characters in some operating systems and file systems at the same time.

Alexander O'Mara
  • 8,774
  • 6
  • 34
  • 38
  • 1
    So, let's say this type of encoding *does* survive decoding and normalization, and makes it through to a filesystem command as `path/to/allow/..%c1%1cRESTRICTED.txt`. Are there any known, still-in-common-use filesystem commands which will interpret that as a forward-slash? Running it through my PHP example, on OSX+Apache, it appears to be interpreted as `..Áhi.txt`, which is safe. Is it safe to assume that other modern OS file systems won't try to interpret these as `/`? – alexw Apr 25 '16 at 20:19
  • @alexw I'm not *aware* of any, but I think this has to do with the implementation of whatever receives it, which is a potentially infinite number of programs. I think the best advice would be to disallow anything but a subset of safe ASCII characters entirely. – Alexander O'Mara Apr 25 '16 at 20:24
6

On Windows and Unix - no. There may be obscure operating systems that use different path separators.

To handle encoding securely there is a simple rule: fully decode before doing sanitisation. If you fail to do this, you sanitisation can be circumvented. Imagine an application that does open(urldecode(normalize(path))). If the path contains ../ then normalize with remove it. But if it contains %2e%2e%2f then normalize will do nothing, and urldecode will convert that to ../. This error has led to a number of real-world vulnerabilities including the well known IIS unicode bug.

Another issue that sometimes appears is nested sequences. Suppose your normalize function does path.replaceAll('../', ''). This can be circumvented by trying ....// - the inner ../ is removed, leaving ../. The solution is either to completely reject strings that contain forbidden sequences, or to recursively apply the normalisation function.

There are other characters that can have surprising results in paths. A null byte is generally allowed within strings in high-level languages, but when this is passed to the C library, the null string is a terminator. File names like evil.php%00.jpg can bypass file extension checks. There was also the IIS semi-colon bug.

In general, file names are not a good place for untrusted data. There is the potential for second-order attacks, where other processes that read the directory listing have vulnerabilities. There might be cross-site scripting in a web page that lists files; there have been Windows explorer vulnerabilities that malicious file names can exploit; and attacking a Unix shell through escape sequences is a recent concern. Instead, I recommend storing the user-supplied file name in a database, and having the file name be the primary key.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • Thank you, this is a really thorough answer. But as for accepting user-specified file names in general, at some point even the web server itself has to do this. For example, an image or Javascript file that is requested via a path. How does something like Apache handle this problem? – alexw Apr 26 '16 at 15:14
  • 1
    @alexw - good point, Apache can't avoid that. Although a web server is (usually) reading files, not writing them, so at least second-order attacks are not a concern. – paj28 Apr 26 '16 at 16:40
  • Cool! My specific application is read-only so, it sounds like I'll be fine with my approach. – alexw Apr 26 '16 at 16:48
  • In Apache, it checks that you haven't traversed outside of your document root. It does follow symlinks, and is generally very permissive, though, so be paranoid. This is why web servers are typically ran with their own user accounts. They just barely follow the security principle of whitelist only, where their whitelist happens to be everything they can traverse to under their document root. – Ghedipunk Apr 26 '16 at 16:51