0

Hoping someone better at htaccess RegEx than I is out there!

I am trying to replace (well remove) a particular path from a URL, however my rules onle seem to want to remove the first folder, not the entire path.

Here's a rule I am testing with, note I rewrite the backreference into the query string just for debugging!

RewriteRule ^folder1/folder2/folder3/(.*)$ http://domain.com/?one=$1&two=$2 [L,NC,R=301]

So, in essance I need domain.com/folder1/folder2/folder3/?query=string to redirect to domain.com/?query=string.

If I can retain the rest of the path thats a bonus but not too bothered, retaining the query string is a must.

Problem is, with the above rule, I get the result

domain.com/?one=folder2/folder3/folder2/folder3/&two=

As you can see, it seems to accept the first directory match (starting with folder1) but then it seems to take the rest of the path I want to remove, duplicates it and sets ist as the first matched group!

This is not waht I want, I want to match the entire 3 folders, remove them and append the rest of the URL,whatever tht is (query string et al).

Anyone care to correct me here? Optionally an idea of what I've done wrong would be useful but not necessary (Taking the chance to evolve my RegEx understanding)

FYI : The folders do not exist, they are mearly paths for routing.

Thanks in advance!

Edit : response to duplicate

This question was marked as a duplicate of this common (and most valuable) post : Redirect, Change URLs or Redirect HTTP to HTTPS in Apache - Everything You Ever Wanted to Know About Mod_Rewrite Rules but Were Afraid to Ask

I have reviewed this post however I can not see what part of it deals with path rewriting as I have questioned specifically, or any part that explains the result I am getting. Therefor I will elaborate...

Indeed a bunch of the examples do look at deper directories/paths than one deep, like these from sysadmin2218 :

RewriteRule ^/blog/([0-9]{4})/([-0-9a-zA-Z]*)\.html   /newblog/$1/$2.shtml

RewriteRule ^/blog/([0-9]{4})/([-a-z]*)\.html  /newblog/$1/$2.shtml

However I am not experinacing this correct matching. This is the essence of my question dispite following examples and the man-page I seem to be getting unexpected results, therefor I asked this question to assertain where my mistake is. I can read another 20 RewriteRule posts (since the last 30 or so I've read before posting) but I'm not getting any closer to seeing what is wrong with my rule.

If anyone can point out my mistake or increase my understanding I will learn from this and not ask this question again, but I do not seem to be able to phathom the answer from the linked or othewise post on the internet.

My consise case Here is an exact replica of my current situation, First here are the rules I need to apply :

  • All rules are applied to a common domain (www.example.com)
  • Remove parameters from the query string if the key is key1 or key2
  • Redirect www.example.com/oldfolder/* to www.example.com/*
  • Redirect www.example.com/folder1/folder2/folder3/* to www.example.com/newfolder/*

and here's my current set of rules

# Activate Rewrite and set the base to the web path
RewriteEngine On
RewriteBase /

# Remove 'key1' from the Querystring, and remove any resulting double &'s
RewriteCond %{QUERY_STRING} (.*)(?:^|&)key1=(?:[^&]*)((?:&|$).*)
# %1 = any previous query string, %2 = any following query string       
RewriteCond %1%2 (^|&)([^&].*|$)    
# %1 = matched double & (disgard), %2 = the new query string
# $1 = non greedy match on the URI upto the last /
RewriteRule ^(.*)/$ $1?%2

#Remove 'key2' from the resulting URI, and remove double & again (as above)
RewriteCond %{QUERY_STRING} (.*)(?:^|&)key1=(?:[^&]*)((?:&|$).*)    
RewriteCond %1%2 (^|&)([^&].*|$)    
RewriteRule ^(.*)/$ $1?%2

#Catch and handle requests beginning with 'oldfolder'
#$1 = non greedy match everything following oldfolder
RewriteRule ^oldfolder(.*)$ $1 [L,NC,R=301]

#Catch the folder1/folder2/folder3 path and rewrite
RewriteRule ^folder1/folder2/folder3(.*)$ newfolder/$1 [L,NC,R=301]

Ok, so to break it down, the query string part is 100%, also the first RewriteRule is fine, in that 'oldfolder' is successfully removed from the URI, which includes the cleaned query string.

The problem is the last rule, consider the following test URL

http://www.example.com/folder1/folder2/folder3/?key0=keep&key1=drop&key2=drop

This should be rewritten to

http://www.example.com/newfolder/?key0=keep

However the result I get is

http://www.example.com/newfolder//folder2/folder3/folder2/folder3/?key0=keep

Not what I expected,, here's the rule in question, and my breakdown of what I expect

 RewriteRule ^folder1/folder2/folder3(.*)$ newfolder/$1 [L,NC,R=301]

So we can assume the query string dealt with, to me the pattern says match the web path (remember RewriteBase / is in effect) that starts with folder1/folder2/folder3, assign the remaining URI to the end to the first group ($1), which would be /?key0=keep. Then the pattern would be newfolder/?key0=keep.

This is not my experiance however... while the query string is correct in my result, the match seems to suck up folder2/folder3/, duplicate it, add it before the query string and go with that...

I am confuseed.com here... (please help!!)

Blatant
  • 111
  • 1
  • 7
  • Thanks for marking this a duplicate question, though I have read through all the linked question and answers thoroughly and can not see the solution. Would you care to elaborate on my specific question please? (see question edit) – Blatant Sep 07 '16 at 09:17
  • I am awaiting a moderator to re-open so I can answer my own question fully, however in brief I got stung by an apache 2.4 bug issue explained here : https://bz.apache.org/bugzilla/show_bug.cgi?id=38642. Consise answer to follow (if reopened) – Blatant Sep 15 '16 at 16:23

1 Answers1

1

Ok, so I tracked down my issue and in the spirit of Stack Exchange I wanted to share my answer.

I have to start out with an admittion, I did not mention the Apache version I was using, I am using Apache 2.4.

Turns out that after all I was getting bitten by a reported bug in Apache (2.2 -> 2.4), namely Bug 38642 - mod_rewrite adds path info postfix after a substitution occured.

The bug intails addition of the path incorrectly when a substition is made, therefor after each of my RewriteRules affecting the query string the path was indeed re-appended.

The bug is fixed in Apache 2.5 reportedly, and there is a work around, by using the DPI flag in the rewrite rules.

This makes my example as so :

# Activate Rewrite and set the base to the web path
RewriteEngine On
RewriteBase /

# Remove 'key1' from the Querystring, and remove any resulting double &'s
RewriteCond %{QUERY_STRING} (.*)(?:^|&)key1=(?:[^&]*)((?:&|$).*)     
RewriteCond %1%2 (^|&)([^&].*|$)    
RewriteRule ^(.*)/$ $1?%2  [DPI,E=querycleaned:1]

#Remove 'key2' from the resulting URI, and remove double & again
RewriteCond %{QUERY_STRING} (.*)(?:^|&)key2=(?:[^&]*)((?:&|$).*)    
RewriteCond %1%2 (^|&)([^&].*|$)    
RewriteRule ^(.*)/$ $1?%2 [DPI,E=querycleaned:1]

#Catch and handle requests beginning with 'oldfolder'
#$1 = non greedy match everything following oldfolder
RewriteRule ^oldfolder(.*)$ $1 [L,NC,R=301]

#Catch the folder1/folder2/folder3 path and rewrite
RewriteRule ^folder1/folder2/folder3(.*)$ newfolder$1 [L,NC,R=301]

#Catchall, if query string cleaned but not previously matched, then redirect to clean string
RewriteCond %{ENV:querycleaned} 1
RewriteRule ^(.*)$ $1 [L,R=301]

As you can see, DPI flag is added to any rule that might make a change but is not flagged as Last.

You may also notice the E=querycleaned:1 flag. This is simply to set an Enviroment Variable in order to catch all, you see the origional example would clean the query string if one of the following Rewrite rules matchd (started with oldfolder or folder1/folder2/folder3) however I wanted the query string cleaned regardless, so basicall I'm setting a variable to indicate if the query was cleaned, if it is you'll notice the new catch all rule at the bottom, which redirects to the existing path, with the clean query.

There you go, solved and solved. In the interest of fairness however once I knew the solution I was able to find a couple of duplicate questions here, so I wasn't the first. I can't even find the question I first found in my history but a quick search on DPI yeilds a few results.

Blatant
  • 111
  • 1
  • 7