4

I want to add a directive in my .htaccess, such that if the browser is pointed at a URI containing %E4 (ä) - or any other special character - the .htaccess automatically rewrites %E4 (ä) as %C3%A4 (ä).

In summary, I want .htaccess to convert Win-1252 percent-encoding to UTF-8 percent-encoding.

I know I can do this by adding a series of mod_rewrite rules - but I was wondering if there is a native .htaccess directive that will take care of this.

====

This is as far as I've got on my own so far (I have many more entries in my .htaccess than in the list below, but I've abridged the list to äéîöü for example purposes):

RewriteRule ([^\ä]*)\ä([^\ä]*\ä[^\/]*)(\/[.*])? $1%C3%A4$2$3 [N]
RewriteRule ([^\ä]*)\ä([^\ä]*)$ http://www.domain.com/$1%C3%A4$2 [NE,R=301]

RewriteRule ([^\é]*)\é([^\é]*\é[^\/]*)(\/[.*])? $1%C3%A9$2$3 [N]
RewriteRule ([^\é]*)\é([^\é]*)$ http://www.domain.com/$1%C3%A9$2 [NE,R=301]

RewriteRule ([^\î]*)\î([^\î]*\î[^\/]*)(\/[.*])? $1%C3%AE$2$3 [N]
RewriteRule ([^\î]*)\î([^\î]*)$ http://www.domain.com/$1%C3%AE$2 [NE,R=301]

RewriteRule ([^\ö]*)\ö([^\ö]*\ö[^\/]*)(\/[.*])? $1%C3%B6$2$3 [N]
RewriteRule ([^\ö]*)\ö([^\ö]*)$ http://www.domain.com/$1%C3%B6$2 [NE,R=301]

RewriteRule ([^\ü]*)\ü([^\ü]*\ü[^\/]*)(\/[.*])? $1%C3%BC$2$3 [N]
RewriteRule ([^\ü]*)\ü([^\ü]*)$ http://www.domain.com/$1%C3%BC$2 [NE,R=301]

Does anyone know:

1) How I can tidy this up?

2) It won't work if there is a more than one special character (single occurrence or multiple occurrences) in the URI. Is there any straightforward way to ensure the rules will handle multiple special characters?

  • Just to explain, I'm new to StackExchange and I'm still learning my way around - I did originally post this question at http://webmasters.stackexchange.com/questions/72367/can-apache-htaccess-convert-the-percent-encoding-in-encoded-uris-from-win-1252 but after 1 week and 79 views, there has not been a single comment or response, so I thought, perhaps, I ought to re-post the same question here instead. Any assistance gratefully received - thanks! – Rounin - Standing with Ukraine Dec 01 '14 at 18:10

1 Answers1

3

The only way I can think of to make this cleaner while just being within Apache would be to use a RewriteMap.

Pointing to a txt for the map replace will force you to do some terrible things to get around the fact that RewriteRule replaces the entire string and that you'll have to have a RewriteRule take place for each character in the string (replaced or not).

So instead, I'd say write an external script in whichever language you're comfortable (ideally one that knows how to convert from 1252 to utf-8 without you needing the hardcode the conversions, python comes to mind) that'll take in the full string, make the needed replacements directly (in real code instead of a huge number of mod_rewrite runs), then pass back the fixed up string for replacement.

RewriteMap win1252-to-utf8 prg:/path/to/executable
RewriteRule - ${win1252-to-utf8:%{REQUEST_URI}}
Shane Madden
  • 112,982
  • 12
  • 174
  • 248
  • This is great, @Shane - thanks very much. I've been waiting to see if there are any other responses forthcoming and while I was waiting I didn't want you to think I was unappreciative! If nobody else can suggest anything, I will try your RewriteMap suggestion and if I can get it to work, I will accept your answer. Thanks again. (And thanks for your patience.) – Rounin - Standing with Ukraine Dec 07 '14 at 08:29