1

I'm running an Ubuntu 10.04 LTS, Apache version 2.2.14.

On httpd.conf I've a rewrite rule that look like this:

RewriteRule (*UTF8)^/users/([^/])([^/]+)/(.*)$ /users/$1/$2/$1$2/$3 [L]    

The idea is to set directories to internationalized domain name (IDN) in my server.

I keep getting

RewriteRule: cannot compile regular expression

Any idea if it is the demon version or something else?

Bart De Vos
  • 17,761
  • 6
  • 62
  • 81
koby
  • 111
  • 2
  • 1
    Have you tried running it through a regexp validator? Returning as invalid for me. What is the purpose of `(*UTF-8)` at the start before the opening `^`, I've never seen anything like that before and can't find it documented, yet when I remove it the regexp becomes valid – Smudge Nov 23 '11 at 09:29
  • Well Sam, I need that the IDN's will be in UTF8 and not in asccii. Do you know a way that I can do that? I saw a working example that does it like that and the only difference that I can think of is the demon version... – koby Nov 23 '11 at 09:36
  • Any chance you could post the example? As I understood it 1) nothing should come before the `^` within the rewrite rule, 2) If you want to match UTF-8 characters the code you have (`([^/])`) should work (As would `.*` or `(.+)`) and 3) Browsers encode UTF-8 characters outside the standard ASCII range (So `á` becomes `%C3%A1` when sent to the server). Try removing `(*UTF-8)` from the start of the regex and see if it works – Smudge Nov 23 '11 at 09:42
  • Sam, The example is similar to my use. As I understand it, IDN's translate to "xn--" (Bücher.ch translated to xn--bcher-kva.ch, Wikipedia example). So removing the UTF8 might remove the error, but will not work... – koby Nov 23 '11 at 10:00
  • @koby Any URL to show where this "working example" is? – Olivier Pons Nov 23 '11 at 11:21
  • I do not have online example(I've it on a friend's server) but you can see other people asking about it (like: http://www.gossamer-threads.com/lists/apache/users/397951) – koby Nov 23 '11 at 12:11
  • I left an answer, but I'm a bit confused as to what issue you're actually trying to solve. It would help if you told us what paths you're getting, and what paths you're expecting to be converted to. – Andrew M. Nov 24 '11 at 23:01
  • I'm trying to host some IDN domains on my server and redirect them to static folders on the server. – koby Nov 25 '11 at 07:12
  • There's something being lost in translation; can you please post an example of what you're trying to convert, and what isn't happening? – Andrew M. Nov 25 '11 at 15:38
  • request for Bücher.ch converted to xn--bcher-kva.ch (browsers) and to %X29%XD7... on wget. I need the apache to work the same on both cases... – koby Nov 25 '11 at 19:30
  • But it sounds like this isn't Apache--its the clients. I.e., wget is converting it to hex encoded characters, while a browser is doing something different. So there's nothing Apache can do--its taking exactly what the client is sending. – Andrew M. Nov 26 '11 at 05:04
  • Can it convert the encoding do it can handle any client? – koby Nov 26 '11 at 08:42

1 Answers1

1

(*UTF8) is not a valid regular expression, and I'm not sure why you're requesting it--things like .* and the like in your regular expression will match any character, UTF8 encoded or not. What you're referring to is perl--not mod_rewrite, which requires explicit enabling of utf8 support.

For mod_rewrite, you're trying to treat a particular encoding in a special way, and its just not needed in this case.

I.e.,

RewriteRule ^/users/(.*)$ /newusers/$1 [L]

will match:

/users/café

and so on. However, keep in mind that using character classes like [a-zA-Z] will NOT match utf8.

Andrew M.
  • 10,982
  • 2
  • 34
  • 29
  • I know that (*UTF8) if not a valid expression, all I'm trying to do is to make sure that the expression I'm getting is UTF8 and not ascii. if I'm making a wget call for Bücher.ch(with my server ip), I'm getting ascii on the apache and my regex is no good. How can I convert it to UTF8? – koby Nov 25 '11 at 07:15
  • The encoding isn't done by Apache; its done by your browser. There's no such thing as UTF8 encoding in a URL. When you type in `/users/café`, that will get percent encoded to `/users/caf%E9`. Your browser may do this transparently, but Apache will convert it back to UTF8 automatically. So if you wanted to match anything that ends in `é`, you could use `/users/(.*\XE9)` (note the hex encoded string in the rule). However, to match ANYTHING, regardless of unicode, `.` will match any character, be it UTF-8, Latin-1, ascii, etc.. – Andrew M. Nov 25 '11 at 15:36
  • I'm trying to redirect a request. all browsers are encoding it to UTF8, but other application (wget,bots etc) are sending it with ascii encoding so the regex wont work for both, and I need it to work on both... – koby Nov 25 '11 at 19:28