Character sets needed to properly display "zalgo"

3

1

The following answer on Stack Overflow is famous for its convincing argument against using regex to parse HTML: https://stackoverflow.com/a/1732454/505154

The content of the post becomes increasingly corrupted, and the end should render something like this:

zalgo properly rendered

However on my Windows XP laptop I see the following:

zalgo replaced by boxes

How can I get these "characters" to display properly?

Andrew Clark

Posted 2012-10-31T21:38:27.140

Reputation: 178

On all browsers? – Karan – 2012-10-31T21:51:00.523

I see essentially the same thing on both Chrome and IE, haven't tried any others, – Andrew Clark – 2012-10-31T21:53:02.800

Answers

6

The simplest way is to download and install a sufficiently large font, such as Symbola, and use Firefox or Chrome.

The problem is twofold. First, the text contains combining diacritic marks that are not supported by the fonts shipped with Win XP. There are some free fonts that you can use to fix this. Second, IE is poor at rendering characters when no font listed in the applicable font-family list covers a character in an HTML document. Chrome and Firefox do a much better job; even Firefox 3, which I tested in a virtual Windows XP system seems to handle the situation OK: it picks up the missing characters from other fonts in the system.

In addition to pages that play with combining diacritic marks in a childish way, there are real pages that make use of such marks. So it’s good to be prepared. There is no single font that covers all characters, so just install additional fonts as needed; Alan Wood has a nice page for downloading fonts with large character repertoires.

Jukka K. Korpela

Posted 2012-10-31T21:38:27.140

Reputation: 4 475

Thanks, was able to get this to work in Firefox, still seeing boxes in Chrome though. – Andrew Clark – 2012-10-31T23:56:48.040

DejaVu comes pretty close – Cole Johnson – 2013-04-09T20:15:47.633

3

It's more than just the character set.

To display that properly, the client rendering the text and any libraries it uses for such need to support combining unicode marks, as well as having the necessary fonts and support for font-stitching to combine fonts as needed.

I think the main font is Microsoft Sans Serif or Arial Unicode MS Regular, which come standard with each version of windows, and has doublessly been updated extensively since it's initial release for XP, probably to include all of the unicode diacritical marks that you see missing here. There might be other fonts at play here, used to fill in gaps in the primary font if it's missing a specific diacritic. I'm not sure of the legal ramifications of downloading a copy of the updated fonts without paying for them (without buying Windows Vista or Windows 7 or Windows 8 for the laptop)

If the issue is a lack of support in the rendering library for combining diacritics, then no amount of fonts will help you display the text correctly. Your only option will be to update the application and/or the libraries that it uses for rendering text.

Darth Android

Posted 2012-10-31T21:38:27.140

Reputation: 35 133