What happens if a presentation layer receives Unicode but can't present it?

1

I am currently a University student studying computer science and while studying for our networking test I was posed with an interesting question in our book.

Lets say a computer was created before Unicode was invented, now the presentation layer receives this Unicode but it is outside its possible interpretation range, what will happen with this data sent to this computer? How will it present this information?

Gordy

Posted 2017-03-28T06:38:37.173

Reputation: 33

"a computer was created before Unicode was invented" That would be in the 1980s. The original ideas behind Unicode date to 1987, and in turn draw from work done as early as 1980, and the first volume of the Unicode standard was published in 1991 (over a quarter of a century ago). – a CVn – 2017-03-29T08:35:01.893

Answers

5

It depends on the specific program. Most will do the best they can.

There is no single "presentation layer" in a computer – it's just a rather vaguely defined part of the OS and/or the individual programs. Each program is different in how you'd separate it to layers (if at all). And all software in a computer can be updated for new features as necessary.

(Personally I wouldn't pay too much attention to layers 6–7 at all, other than them being "the software which makes use of session layer".)


Also, realize that Unicode is an abstract standard and not transmitted over the network – programs usually send and receive specific encodings, such as UTF-8 or UTF-16. So a pre-Unicode program wouldn't have an "out-of-range" problem out of nowhere, because it wouldn't interpret the received bytes this way in the first place.


Presentation usually consists of several individual tasks, therefore when I say 'software' below, it might refer to a different component every time. (For example, decoding of UTF-8 into the program's internal representation might be handled by libc, layout by Pango, font rendering by FreeType.)

  • First comes decoding. So what does a program do if it receives an UTF-8 message that it doesn't understand? Usually, if it knows that the data is text, it'll use some sort of fallback encoding to decode it. For example, if an old Internet email program sees MIME type text/plain; charset=utf-8, it'll know the message is textual, and will try to interpret its bytes as ISO 8859-1 or Windows-1252, even if it results in garbage.

    (As it happens, both UTF-8 and ISO 8859 are based on ASCII, so many European texts actually result in a halfway-readable decoding regardless. See the Wikipedia Mojibake article for examples.)

    That said, this doesn't always work – certain formats are stricter than others. For example, if an ASN.1 document has UnicodeString instead of IA5String, old programs won't know it's still text or some other kind of data. So if a Korean company buys a SSL certificate, some old browsers will show their name as "Organisation: [unrecognized]".

  • Then there's interpretation. If the software supports an older version of Unicode and receives text with codepoints outside its known range, that's not a problem until they're shown on screen – at which point you'll see the "�" replacement symbol in their place.

    Of course, if you e.g. tell the text editor to uppercase everything, it won't be able to do that with out-of-range characters. But it'll still work.

    (Recently, many "Unicode-compatible" programs and websites were found to use UCS-2 internally (which only goes up to U+FFFF). If they were given UTF-16-encoded text with codepoints above that (e.g. emoji), they would interpret the UTF-16 surrogate pairs as two unrecognized codepoints and show "��" instead of a "".)

  • Finally there's displaying. If the software decoded and recognized the codepoints, but doesn't have the required fonts for the characters, then it will usually draw a placeholder as well. Linux software usually draws a rectangle with tiny hex numbers in it (the codepoint number); macOS uses a special fallback font instead; on Windows you might get question marks in a box.

Finally, when it comes to non-text media (images, video, audio), each format is very different from the others, so if it's not recognized programs just give up and show a placeholder.

user1686

Posted 2017-03-28T06:38:37.173

Reputation: 283 655

This is a great answer. Thank you very much. It's really very helpful. This kind of question is quite likely to come up in my test (i.e. how would characters be represented if it can't handle a certain encoding format) and you have answered it for me. – Gordy – 2017-03-29T08:13:50.263

@Gordy for an extreme example of a program not being able to interpret what it is reading into text, try opening an .exe file in notepad. – barlop – 2017-03-29T09:07:06.947

@barlop yeah thanks. I've done that before in Sublime Text not sure if it is the same in notepad. – Gordy – 2017-03-29T09:32:10.983

@Gordy You don't need to be not sure. It is very quick to check. – barlop – 2017-03-29T11:47:53.120

@barlop I'm using Linux. – Gordy – 2017-03-29T13:03:06.127