5
1
Edit: Summary
Apparently the intended character to display in this case is an "en-dash".
This page has a table half way down that shows that for the –
, some software will convert the correct hex code of 2013 to 0096. (look at the first row in the table).
This answer on Stackoverflow explains that somehow this is a mixup between Windows-1252 and UTF-8
This blog article enforces this:
Character 150 (0x96) is the unicode character "START OF GUARDED AREA" in the non-displayed C1 control character range, but in the Windows-1252 encoding it's mapped to to the displayable character 0x2013 "en-dash" (a short dash).
Others have struggled with this when producing content, as this answer on Stackoverflow shows how to replace 0x0096 with 0x2013.
Google must realize this, because as stated in my original question below, Google's cached version of the Amazon page has –
so it seems they are automatically correcting these mistakes on pages they cache.
I have tried setting my encoding to Windows-1252 but that does not help.
So now I guess my question is, how can I tell Firefox to ignore unprintable characters like these?
Original content below:
(Firefox 3.6.13 on Windows XP)
Every once in a while I notice an odd character on certain web pages when browsing the web. It is a outline of a box with a 4-digit number inside.
And example of a page that has these characters is: http://aws.amazon.com/ec2/#highlights
After each section heading (Elastic, Completely Controlled, ...) I see a box with the number "0096" inside. I looked at the cached version on Google, and google has –
in it's place, so I'm guessing I should be seeing a dash there instead of the box with the numbers in it.
I have tried changing the character encoding in Firefox but haven't been able to find one that shows these characters correctly.
Is there a way to allow Firefox to view these characters?
Thanks in advance!
Edit - adding a screen shot of the "special" characters:
Edit #2 - tried in Ubuntu - new screenshots
I logged into my Ubuntu desktop and browsed to the amazon page in Chrome and Firefox. Chrome completely ignores character, even if I inspect or view page source. Firefox in Unbutu displays the character exactly like Firefox on my Windows XP box. I copied the character and played around with it at the command line - here is a screenshot of the results:
It looks like I can paste the character into this post as well: ``
It is definitely not isolated to Windows XP. I tried setting the character encoding for my terminal to Windows 1252 (from Dennis' comment below), but then it just displays this character as a question mark.
I pulled the webpage down with wget and with curl, and both outputs show this characters as: <96>
It makes me wonder if this character renders correctly for anyone? It appears webkit just ignores it, my IE6 ignores it, Firefox displays the box with the numbers in it. I would have to imagine the design team at Amazon can see it correctly?
It's not a huge deal to get these characters displaying correctly, but it would be nice to know if there is a solution to this.
When I look at the page and its source, I see no
–
es. Do you only see the boxed in the cached copy you're not linking to, or on the live page as well? – Daniel Beck – 2011-01-03T19:01:56.597On the live page, I see the boxed character. I wanted to know what the character was, so I looked at the cached version on Google, which had
–
– BrianH – 2011-01-03T19:16:16.270The page looks funny in IE7, too, so it not a Firefox issue, IMHO. – martineau – 2011-01-03T19:26:52.697
True, Safari ignored it, but when I copy&paste and save in a text editor, it's odd there as well. Hex code might be
0xc296
, but I might have made a mistake during copying. – Daniel Beck – 2011-01-03T19:29:13.600I found this page on mozilla.com: http://support.mozilla.com/fr/questions/752866 - one answer suggests that Firefox cannot map a character to a font. I've never changed the default font in Firefox - I wonder what font Amazon is intending?
– BrianH – 2011-01-03T20:02:11.817@BrianH Font should be coming from css – Aaron McIver – 2011-01-03T22:21:08.127
@Aaron Sure, but there are multiple versions of fonts. See the accepted answer here, about half way through the post.
– Daniel Beck – 2011-01-04T08:51:58.717"96c2" confirms my findings above. – Daniel Beck – 2011-01-04T15:17:15.087
Regarding your latest edit: It really looks like the Amazon folks messed this up. It's time you rephrased your question ("How can I tell firefox to ignore unprintable/unused characters?" or something like that), so that you might get something useful from it. – Daniel Beck – 2011-01-04T15:24:09.087