Saving "Bush hid the facts" in notepad

54

13

When saving the text "Bush hid the facts" in notepad under Windows XP, how come when you reopen it shows squares instead of the text?

I saw it in this video if you need an example

http://www.youtube.com/watch?v=9bK9-sc_uus&feature=related

Mohammed

Posted 2009-07-31T00:38:45.993

Reputation:

28By the way, it's the same for any sequence of 4-3-3-5 letters, not just this one. – user1686 – 2009-07-31T08:42:07.613

15Example: "John ate the bacon" – Troggy – 2009-08-18T18:24:25.427

Answers

93

This is due to a problem with the Win32 API function IsTextUnicode dating back to Windows NT 3.5. If a file is encoded in ANSI, the function will interpret it as UTF-16LE resulting in unreadable characters.

This fascinated me too back when I discovered it since I was kind of young and naive, I thought it was an actual conspiracy :)

There is actually a Wikipedia article on this you can find here.

John T

Posted 2009-07-31T00:38:45.993

Reputation: 149 037

17Interesting. +1 for the Wiki article that taught me the word "mojibake" and its particularly meta warning that "without proper rendering support, you may see question marks, boxes, or other symbols..." :-) – jtb – 2009-07-31T01:33:42.947

3+1 because, despite using Windows for as long as I can remember, I somehow never came across this! – Jared Harley – 2009-07-31T02:56:12.757

It's not actually a bug, as argued in Raymond Chen's article if you follow the external link in the wikipedia's article. The documentation of IsTextUnicode clearly states the function is statistical and "are not foolproof". Given a short string such as the one here, it is not surpising something is detected wrong. – KTC – 2009-07-31T03:44:20.370

7Well, it's clearly a bug, because the software incorrectly. The best you can argue is that bugs like this are impossible to eliminate without losing other functionality. And, heck, Microsoft fixed it in Vista [according to Wiki], so someone there obviously thought it was a bug too. – John Fouhy – 2009-07-31T04:30:03.880

11It's not a bug if it does exactly what it advertise (i.e. documented) to do. It's specified precisely that it's a statistical test and not foolproof, and the shorter the input, the higher the error rate. It just so happens that in this case, it happens with a sentence that make sense to human. This particular sentence doesn't work with Vista & 7 because the implementation of IsTextUnicode have been changed and presumingly improved and it now report correctly for this sentence. What we have is better or worse false positive / negative rate, not bugs. – KTC – 2009-07-31T06:00:51.167

@John It was obviously a feature. – Mateen Ulhaq – 2011-01-28T06:06:43.197

1-1 for an eighteen-year-old saying "back when ... I was kind of young". (kidding, I didn't really downvote you) – Graeme Perrow – 2009-09-02T17:48:06.187

Some how I got here, two years later. There is the obligatory Old New Thing blog post and the time machine reference. http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

– surfasb – 2011-12-20T10:26:28.947

5"It's not a bug if it does what it's supposed to." Yeah maybe the technical term is 'design flaw' or something, but I think most people would still say it's ok to call it a bug. – davr – 2009-11-18T00:37:26.137