0

I am aware of the canonical question and have read it, yet I seem to be unable to find some stuff there.

Here are my conditions and rules to drop www and force https:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L,NE]

RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]

I understand what I am trying to match. However the substitution rules are a bit unclear to me. What I don't understand is:

  1. How did my hostname (without www.) end up in %1?
  2. Why isn't the query string lost when the second rule is applied?

The reason behind the second question is that the manual explicitly states (highlighted by me):

REQUEST_URI

The path component of the requested URI, such as "/index.html". This notably excludes the query string which is available as as its own variable named QUERY_STRING.

Džuris
  • 145
  • 1
  • 8

1 Answers1

3

I assume these directives are working OK and you are just after an explanation as to why?

  1. How did my hostname (without www.) end up in %1?

%1 is a backreference to the first captured group in the last matched CondPattern. So, given the following condition:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]

The regex (ie. CondPattern) ^www\.(.*)$ is matched against the HTTP_HOST server variable. The match is successful when HTTP_HOST satisfies the regex ^www\.(.*)$, which is www. followed by anything. That anything is part of a captured group (parenthesised subpattern). ie. (.*), not simply .*. Whatever matches the (.*) group is saved in the %1 backreference and can be used later in the RewriteRule substitution. For example, given a request for www.example.com/something, this becomes:

RewriteCond www.example.com ^www\.(.*)$ [NC]

%1 will therefore contain example.com.

Why isn't the query string lost when the second rule is applied?

Because, if you don't explicitly include a query string on the RewriteRule substitution then the query string from the request is automatically appended onto the end of the resulting substitution.

However, if you included a query string on the end of the substitution, even just an empty query string (a ? followed by nothing), then the query string from the request is not appended. For example:

RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI}? [R=301,L,NE]

This will result in the query string being stripped from the request (note the trailing ?). Alternatively, on Apache 2.4+ you can use the QSD (Query String Discard) flag to prevent the query string being appended.

Aside: I also removed the parentheses from the RewriteRule pattern. You don't need a captured group here, since you are using the REQUEST_URI server variable instead. (This would be available in the $1 backreference - note the $ prefix. Storing backreferences when you don't need them is just waste of resources and hampers readability.)

RewriteCond %{HTTP:X-Forwarded-Proto} !https

I assume your server is behind a proxy server that is setting the X-Forwarded-Proto header?

MrWhite
  • 11,643
  • 4
  • 25
  • 40
  • Thank you for your explanations! Yes, I patched the directives from examples and they are working. I became suspicious because one rule consisted of %1 and $1 and the other included variables directly. It turned out useful as you revealed to me that i store a redundant reference in the last rule. Do I understand correctly that the `%{REQUEST_URI}` and `$1`would behave the same in both rules and are different only because I got the parts from different examples? If so, how should one choose which to use? And to expand on query string - is anything else "automatically appended" or only that? – Džuris Apr 22 '17 at 15:23
  • 1
    As to your last question - that document root is served over two domain names, let's say `example.org` and `assets.example.com`. The assets on `example.org` are included using `assets.example.com` domain name which points to a proxy that caches the assets. That's why I had to put two `RewriteCond`s there. – Džuris Apr 22 '17 at 15:29
  • 1
    `%1` is a backreference to the last matched `RewriteCond` directive and `$1` is a backreference to the `RewriteRule` _pattern_. Using `%{REQUEST_URI}` or `$1` in this instance is largely a matter of preference. However, they are not necessarily the same - it depends on _context_. In a directory (incl. `.htaccess`) context then they are slightly different, however, in a server config/virtual host context they are probably the same. – MrWhite Apr 22 '17 at 16:12
  • 1
    eg. In `.htaccess` then a request for `example.com/path/to/file` would result in `REQUEST_URI` containing `/path/to/file`, but `$1` would contain `path/to/file` (note the missing slash prefix). This is consistent with your code example and from that I would assume you are in a directory (or `.htaccess`) context? – MrWhite Apr 22 '17 at 16:13
  • Yes, these are directives in a `.htaccess`. – Džuris Apr 22 '17 at 16:16
  • 1
    Nothing else is "automatically appended". Btw, the same applies to `Redirect` and `RedirectMatch` (mod_alias) directives. As regards which to use... `REQUEST_URI` or `$1`... `REQUEST_URI` is always the same, regardless of whether you are using server config or `.htaccess`. But it's not always possible to use `$1` like this (instead of `REQUEST_URI`), for example: `RewriteRule !^foo$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]` - this only redirects when the request is _not_ `/foo`. (It's not possible to have a captured group in a _negated_ regex.) – MrWhite Apr 22 '17 at 16:29