How can I tell Firefox to ignore unprintable characters?

5

1

Edit: Summary

Apparently the intended character to display in this case is an "en-dash".

This page has a table half way down that shows that for the –, some software will convert the correct hex code of 2013 to 0096. (look at the first row in the table).

This answer on Stackoverflow explains that somehow this is a mixup between Windows-1252 and UTF-8

This blog article enforces this:

Character 150 (0x96) is the unicode character "START OF GUARDED AREA" in the non-displayed C1 control character range, but in the Windows-1252 encoding it's mapped to to the displayable character 0x2013 "en-dash" (a short dash).

Others have struggled with this when producing content, as this answer on Stackoverflow shows how to replace 0x0096 with 0x2013.

Google must realize this, because as stated in my original question below, Google's cached version of the Amazon page has – so it seems they are automatically correcting these mistakes on pages they cache.

I have tried setting my encoding to Windows-1252 but that does not help.

So now I guess my question is, how can I tell Firefox to ignore unprintable characters like these?


Original content below:


(Firefox 3.6.13 on Windows XP)

Every once in a while I notice an odd character on certain web pages when browsing the web. It is a outline of a box with a 4-digit number inside.

And example of a page that has these characters is: http://aws.amazon.com/ec2/#highlights

After each section heading (Elastic, Completely Controlled, ...) I see a box with the number "0096" inside. I looked at the cached version on Google, and google has – in it's place, so I'm guessing I should be seeing a dash there instead of the box with the numbers in it.

I have tried changing the character encoding in Firefox but haven't been able to find one that shows these characters correctly.

Is there a way to allow Firefox to view these characters?

Thanks in advance!

Edit - adding a screen shot of the "special" characters:

alt text

Edit #2 - tried in Ubuntu - new screenshots

I logged into my Ubuntu desktop and browsed to the amazon page in Chrome and Firefox. Chrome completely ignores character, even if I inspect or view page source. Firefox in Unbutu displays the character exactly like Firefox on my Windows XP box. I copied the character and played around with it at the command line - here is a screenshot of the results:

alt text

It looks like I can paste the character into this post as well: ``

It is definitely not isolated to Windows XP. I tried setting the character encoding for my terminal to Windows 1252 (from Dennis' comment below), but then it just displays this character as a question mark.

I pulled the webpage down with wget and with curl, and both outputs show this characters as: <96>

It makes me wonder if this character renders correctly for anyone? It appears webkit just ignores it, my IE6 ignores it, Firefox displays the box with the numbers in it. I would have to imagine the design team at Amazon can see it correctly?

It's not a huge deal to get these characters displaying correctly, but it would be nice to know if there is a solution to this.

BrianH

Posted 2011-01-03T18:52:32.563

Reputation: 713

When I look at the page and its source, I see no &ndash; es. Do you only see the boxed in the cached copy you're not linking to, or on the live page as well? – Daniel Beck – 2011-01-03T19:01:56.597

On the live page, I see the boxed character. I wanted to know what the character was, so I looked at the cached version on Google, which had &ndash; – BrianH – 2011-01-03T19:16:16.270

The page looks funny in IE7, too, so it not a Firefox issue, IMHO. – martineau – 2011-01-03T19:26:52.697

True, Safari ignored it, but when I copy&paste and save in a text editor, it's odd there as well. Hex code might be 0xc296, but I might have made a mistake during copying. – Daniel Beck – 2011-01-03T19:29:13.600

I found this page on mozilla.com: http://support.mozilla.com/fr/questions/752866 - one answer suggests that Firefox cannot map a character to a font. I've never changed the default font in Firefox - I wonder what font Amazon is intending?

– BrianH – 2011-01-03T20:02:11.817

@BrianH Font should be coming from css – Aaron McIver – 2011-01-03T22:21:08.127

@Aaron Sure, but there are multiple versions of fonts. See the accepted answer here, about half way through the post.

– Daniel Beck – 2011-01-04T08:51:58.717

"96c2" confirms my findings above. – Daniel Beck – 2011-01-04T15:17:15.087

Regarding your latest edit: It really looks like the Amazon folks messed this up. It's time you rephrased your question ("How can I tell firefox to ignore unprintable/unused characters?" or something like that), so that you might get something useful from it. – Daniel Beck – 2011-01-04T15:24:09.087

Answers

0

0096 is most likely an ASCII reference to the ' char which can be displayed within HTML as &#96;

Looking at your link however the HTML looks normal and there is no reference to &ndash;

...

<p><span class="product_highlights">Elastic</span>  Amazon <span class="caps">EC2</span> enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.</p> 


    <p><span class="product_highlights">Completely Controlled</span>  You have complete control of your instances. You have root access to each one, and you can interact with them as you would any machine. You can stop your instance while retaining the data on your boot partition and then subsequently restart the same instance using web service APIs. Instances can be rebooted remotely using web service APIs. You also have access to console output of your instances.</p> 


    <p><span class="product_highlights">Flexible</span>  You have the choice of multiple instance types, operating systems, and software packages.  Amazon <span class="caps">EC2</span> allows you to select a configuration of memory, <span class="caps">CPU</span>, instance storage, and the boot partition size that is optimal for your choice of operating system and application.  For example, your choice of operating systems includes numerous Linux distributions, Microsoft Windows Server and OpenSolaris.</p> 

...

Firefox should have no issues displaying the dash glyph as I just tested on 3.6.*...

<html>
    <head>
        <body>
            My dash is &ndash;
        </body>
    </head>
</html>

...copy and paste the above code in a test document and name it test.html and open it up in Firefox. It should display your dash without any problems.

EDIT: As pointed out by Dave 0x96 is the ANSI equivalent of en dash. With this understanding it appears that this is a parsing issue with regards to the doctype specifiction within the page itself. Check out this thread.

You could extract the HTML and modify the doctype to see if this indeed where the issue is stemming from. It is most likely a cross between encoded values ie...ANSI -> Unicode; as Unicode the value is a non-printable char.

Aaron McIver

Posted 2011-01-03T18:52:32.563

Reputation: 1 405

I think Google converted it to &ndash; when it cached it. Because if I view the source on the live page on Amazon, I get an unprintable character. - it looks something like this: Â~V But you are correct, Firefox has no problem displaying the &ndash; – BrianH – 2011-01-03T19:35:52.510

See my comment to the question above. Just copy&paste the HTML in a text editor and the character will show up. – Daniel Beck – 2011-01-03T19:36:09.193

10x96 (decimal 150) is an en dash in Windows code page 1252. – Paused until further notice. – 2011-01-03T22:02:33.680

0

The error seems like it's with the page. Try changing the character encoding to Windows-1252 in Firefox to see if that helps.

A lot of badly-configured webpages will say they're ISO-8859-1 or UTF-8 and they're really Windows-1252.

If it's a page you control, try re-saving it and specifying a different encoding.

Broam

Posted 2011-01-03T18:52:32.563

Reputation: 3 831

Yep, I had tried setting Firefox to use Windows-1252. It changes the character to  instead. I saved a copy of the page on my own server, changed the encoding to Windows-1252, and it also displays the character as  – BrianH – 2011-01-04T15:29:20.810