A common vulnerability is for web applications to accept a filesystem path as a request parameter, and then perform some action on the specified path. For example, retrieving a file and returning it to the user, or perhaps even writing/deleting a file. This can allow malicious users to access files they're not supposed to be able to access, such as source code, password files, etc.
This can be a problem even if you prepend a base path, because an attacker can use directory traversal sequences like ../
(*nix) and ..\
(Windows) to traverse up out of the base path and into restricted directories and files.
The general idea then, is to either outright ban any such traversal sequences, or to "normalize" paths containing these sequences and then check that they still fall within the base path.
From what I understand, trying to blacklist traversal sequences in the request parameter is difficult, because there are alternate URL and Unicode encodings (e.g. %2e%2e%2f
, and many, many others) that may end up being interpreted by the web server as ../
or ..\
.
However, are alternate encodings still a problem if we normalize the path after it has been received by the web server (e.g., Apache) and passed to my application? For example, using these normalization rules?
My reasoning is that, whether the web server receives %2e%2e%2f
and decodes it to ../
, or the web server directly receives ../
, the normalization algorithm will see ../
and resolve/filter it before checking against the base directory. Even if an attacker uses a double-encoding like %252e%252e%252f
, it will be decoded as %2e%2e%2f
, which will be treated literally by the OS/file system.
However, in this question, it suggests that:
However, if a web server is serving files and decoding the unicode is done after the check that prevents directory traversal or done slightly differently by the operating system, this attack may get past the filter allowing the attack to work.
So in other words, are there any known sequences of characters that could make it past the web server's decoding AND the aforementioned normalization routine (which handles ../
or ..\
), that could be interpreted by the OS or lower-level file handling functions as directory traversal?
For example, in PHP:
$base = "/path/to/allowed/directory/";
$path = $_GET['path'];
$normalizedPath = normalize($path); // normalize will remove or resolve ../ and ..\
$fullPath = $base . $normalizedPath;
echo file_get_contents($fullPath);