2

I would like to know if it would be an issue to use this code in a production environment:

RewriteCond %{HTTP_HOST} ^(?:www\.)?([a-zA-Z0-9_-]+)\.(?:com|co\.uk|es|de)$
RewriteCond $1 ^sitemap([0-9]+)?\.xml(\.gz)?$
RewriteRule ^(.*)$ /files/%{HTTP_HOST}/$1 [L,QSA]

Basically, I have hundreds of domains pointing to the same home directory on a server. I would like to move all the sitemaps for each site to a different folder (so example1.com can not access example2.com's sitemap!)

First of all, I can not hard-code all the domains in a "white-list" as we are talking about hundreds of them and adding more weekly.

The plan is to basically redirect any requests for sitemap.xml/sitemap2.xml/sitemap.xml.gz to the domains' folder.

So for instance:

example1.com will have it's real sitemap.xml file in /files/example1.com/sitemap.xml
example2.com will have it's real sitemap.xml file in /files/example2.com/sitemap.xml

My question, is if it is a possible issue to use HTTP_HOST in a RewriteRule, as I know that it can indeed be an issue if you do not filter it in PHP for example, if you do a redirect using HTTP_HOST as the user can manipulate it.

Thank you!

JDW
  • 23
  • 2

1 Answers1

0

From what I can see so far it looks like you've covered your bases fairly well. However, it never hurts to run through what is protecting you, so you can make sure and not accidentally break it.

  1. As you mention, the HOST header can be set by the user, so it should be treated as unsafe data.
  2. The host is used to determine the final file to load (RewriteRule ^(.*)$ /files/%{HTTP_HOST}/$1 [L,QSA]). This carries a risk of directory traversal.
  3. In particular, the issue would be with hosts like ../../secret/key.secret. Now, it's quite possible that Apache would automatically reject something like that. I'm not 100% sure because I haven't used Apache and .htaccess in quite a while, but regardless I wouldn't rely on Apache as my sole security measure.
  4. However, you're not. Right now the regular expression that must match the host to trigger this rewrite rule is very restrictive, and a directory-traversal host wouldn't match anyway (which is good!).
  5. However, that could change in the future. In particular, your current setup is safe because you haven't left any room for subdomains. I.e. your rewrite rule will match www.example.com but it won't match api.site.example.com.
  6. Future needs may change, and you may quickly discover that you have to update your rule to support subdomains. When that happens you'll want to be careful to make sure you don't also add support for directory traversal hosts.
  7. For instance, one wrong way to do it would be to update the middle of your regular expression to be: ^(.*)\.(?:com|co\.uk|es|de)$ That might be silly, but it is also a quick and easy way to support subdomain matching in your host, and a developer who just wants it done quickly might opt for something like this. It also happens to match directory traversal payloads in the HOST header.

So you are safe for now, but it is also important to keep in mind that this is something that might have to legitimately change in the future, and someone may accidentally introduce trouble when it changes. I'd handle that with a few steps:

  1. Confirm that Apache wouldn't allow a directory traversal attack anyway
  2. Leave a comment about the importance of avoiding directory traversal attacks here
  3. Put in an integration test that tries a directory traversal and confirms that it doesn't work. Unfortunately, since this is implemented in apache, you can't do that... so I guess it's just 1 & 2!
Conor Mancone
  • 29,899
  • 13
  • 91
  • 96
  • Hello Conor, thank you so much for taking the time to write a detailed response. I very much appreciate it. Absolute great tips and I will definitely be doing some more research into directory traversal attacks! – JDW Sep 19 '19 at 21:20
  • Just to add a bit of info, there won't be any need to add support for subdomains, so that's something I can leave out and the reason why I added the first rule. Will be doing some testing to see how apache reacts to various different inputs – JDW Sep 19 '19 at 21:22
  • @JDW that helps - input validation is much simpler without subdomains. If it helps, the quintessential directory traversal payload is something like: `../../../../../../../../etc/passwd` The goal is obviously to read the `/etc/passwd` file, which is often world-readable. You include large numbers of `../` because you have to make sure and get all the way up to the root directory, and usually extras will be ignored after you get there, so more is better than less. – Conor Mancone Sep 19 '19 at 21:52