1

We have case where some adobe pdf files format the hyphen character as %E2%80%90. See http://forums.adobe.com/message/2807241 this is caused by the Calibri font I guess.

So these pdf files have been released and the links don't work So I thought mod rewrite would come to the rescue.

I followed this post here mod_ReWrite to remove part of a URL but I can't seem to search for the % characters according to this question.

Is there anything else I can do?

Here is the rewrite rule I want to use:

RewriteRule ^foo%(.+)bar  /foo-bar [L,R=301]

I also tried this and it doesn't work

RewriteRule ^foo%E2%80%90bar  /foo-bar [L,R=301]

Any Ideas?

ChickenFur
  • 449
  • 1
  • 5
  • 15

2 Answers2

1

From the docs:

... it is applied to the (%-decoded) URL-path of the request ...

So use the actual character in a UTF-8-encoded file instead.

Ignacio Vazquez-Abrams
  • 45,019
  • 5
  • 78
  • 84
  • Is there any way for it to see the difference between foo%E2%80%90bar and foo-bar? If not I might be out of luck. – ChickenFur May 30 '12 at 16:43
  • ... But you're looking for foo‐bar... – Ignacio Vazquez-Abrams May 30 '12 at 16:53
  • http://www.example.com/foo%E2%80%90bar comes up with a page not found error. while http://www.example.com/foo-bar works fine. I guess I could just move the page to foo-bar-v1 and then redirect all foo-bar to foo-bar-v1. – ChickenFur May 30 '12 at 17:09
1

Using the answer from this question, I was able to come up with this .htaccess rule which fixed my own unicode-hyphen-links-in-pdfs problem:

# for janky pdfs with links using unicode hyphens
RewriteRule ^([^_]*)\x25E2\x2580\x2590([^_]*_.*) $1-$2 [N]
RewriteRule ^([^_]*)\x25E2\x2580\x2590([^_]*)$ /$1-$2 [L,R=301]
cowbellemoo
  • 126
  • 2