1

Whether I need to escape the slash '/' in RewriteCond?

Currently I write the following rule in .htaccess:

RewriteCond %{QUERY_STRING} rp=/knowledgebase/
RewriteRule ^index\.php$ https://www.datanumen.com/knowledgebase/ [QSD,R=301,L,NC]

However, this only works for URL like https://www.datanumen.com/fi/customer/index.php?rp=/knowledgebase/7/How-to-order-the-full-version-of-DataNumen-Access-Repair.html&language=swedish, but not work for URL like https://www.datanumen.com/fi/customer/index.php?rp=%2Fknowledgebase%2F7%2FHow-to-order-the-full-version-of-DataNumen-Access-Repair.html&language=swedish

So, I have to modify the rule, as below:

RewriteCond %{QUERY_STRING} rp=/knowledgebase/ [OR]
RewriteCond %{QUERY_STRING} rp=%2Fknowledgebase%2F
RewriteRule ^index\.php$ https://www.datanumen.com/knowledgebase/ [QSD,R=301,L,NC]

But I check https://serverfault.com/a/968916/280923 and it said "The slash (/) does not need to be escaped". So I am confused.

If I need to take all situations into consideration, i.e., the escaped version and unescaped version of '/', then there should be totally 4 combination, should I add all of them as the RewriteCond:

rp=/knowledgebase/
rp=%2Fknowledgebase%2F
rp=%2Fknowledgebase/
rp=/knowledgebase%2F
alancc
  • 133
  • 9

1 Answers1

1

Should I escape the slash / in RewriteCond?

By "escape the slash", you really mean "should I match a URL encoded slash or not?". This depends entirely on the HTTP request being made to your server.

But I check https://serverfault.com/a/968916/280923 and it said "The slash (/) does not need to be escaped". So I am confused.

The linked question/answer is unrelated to the current issue. That question is dealing with backslash-escapes in Apache directives/regex, not URL-encoded (or %-encoded) URLs that you are dealing with here. These are two very different types of "escaping" methods for different purposes.

What you are dealing with are %-encoded URLs. How the URL appears in the HTTP request. Different parts of a URL (notably the "path" and "query string") have different encoding requirements. Whether a particular character needs to be %-encoded depends on whether it would otherwise have special meaning in that context.

As defined in RFC3986, the slash (/) does not strictly need to be %-encoded in the query string part of the URL. However, URL encoding functions (such as in PHP and JavaScript) will often %-encode this character. (I think this is largely historical as some old implementations reportedly did not handle an unencoded slash correctly - reference RFC3986.)

However, regardless of whether a character needs to be URL encoded (to negate its special meaning), any character can be %-encoded, and this should be treated the same as the literal (unencoded) character.

Whether or not you need to match / (decoded) or %2F (encoded) depends on whether or not that character is %-encoded in the request.

Your problem is that the QUERY_STRING server variable is not %-decoded, unlike the URL-path that is matched by the RewriteRule pattern.

But... do you need to check for both the %-decoded / and %-encoded %2F? Presumably you are consistently linking to only one or the other (canonical) URL. So, any requests to the non-canonical URL would have to be manually typed or linked to erroneously by a third party. Are you receiving requests to both? What are the consequences of not redirecting the non-canonical URL?

Otherwise, yes, you would need to check for both (and potentially all variations/cases of). Although this will likely only be /knowledgebase/ or %2Fknowledgebase%2F. But note that it could be %2F (uppercase) or %2f (lowercase). Uppercase is just a convention. Having to check for a mixed encoding, such as %2Fknowledgebase/ should be very rare. But taken to an extreme this is also the same as %2f%6b%6e%6f%77%6c%65%64%67%65%62%61%73%65%2f. Whether you would need to handle all these variations depends on the likelihood of getting such a request and the severity of the rule not matching.

So, to match both /knowledgebase/ and %2Fknowledgebase%2F (case-insensitive) you could use something like:

RewriteCond %{QUERY_STRING} ^rp=(/|%2[Ff])knowledgebase(/|%2[Ff])

You could avoid the character class [Ff] and use the NC flag instead to make the whole comparison case-insensitive. For example:

RewriteCond %{QUERY_STRING} ^rp=(/|%2F)knowledgebase(/|%2F) [NC]

On Apache 2.4 you can use the unescape() function in an Apache expression with the RewriteCond directive to URL decode the QUERY_STRING before making the comparison. However, this doesn't really help you since it doesn't %-decode slashes, ie. %2F or %2f remains as per the request (but any other characters are %-decoded). For example:

RewriteCond expr "unescape(%{QUERY_STRING}) =~ m#^rp=(/|%2[Ff])knowledgebase(/|%2[Ff])#"

This would allow you to match rp=%2f%6b%6e%6f%77%6c%65%64%67%65%62%61%73%65%2f.


Or, if you are not expecting any URL encoded characters in the query string then you could simply block any request that sends any! For example, the following would need to go at the top of your config:

# Block any request that includes a %-encoded character in the query string
RewriteCond %{QUERY_STRING} %[\da-f]{2} [NC]
RewriteRule ^ - [R=400]
MrWhite
  • 11,643
  • 4
  • 25
  • 40