Why does the Unicode character (U+2295) come out as (U+2248)

4

1

I open Notepad and then type +2295 holding down the Alt key, then release the Alt key. I save the file with Unicode encoding. However the output is not http://www.fileformat.info/info/unicode/char/2295/index.htm as expected, but this http://www.fileformat.info/info/unicode/char/2248/index.htm instead. What am I doing wrong? Looking for some pointers.

For anyone else stumbling with this: Please note EnableHexNumpad needs to be a new String Type (See the Wiki page linked in the answer)

user1720897

Posted 2015-12-10T08:42:12.443

Reputation: 277

What font are you using? – Burgi – 2015-12-10T08:48:06.913

The font I am using is Consolas – user1720897 – 2015-12-10T08:51:43.940

OK that is crazy, I get something completely different. – Burgi – 2015-12-10T09:03:09.993

Related: How do you type Unicode characters using hexadecimal codes?

– Arjan – 2015-12-11T16:20:48.850

Answers

3

The Wikipedia entry on Unicode input methods lists a necessary prerequisite for this to work:

A prerequisite for this input method is that the registry key HKEY_CURRENT_USER\Control Panel\Input Method contains a string type (REG_SZ) value called EnableHexNumpad, which has the value data 1. Users need to log off/in on Windows 8.1/8.0, Windows 7, and Vista or reboot on earlier systems after editing the registry for this input method to start working.

After I added this registry key on my machine and rebooted, the input works just as advertised.

Boldewyn

Posted 2015-12-10T08:42:12.443

Reputation: 3 835

The answer in it says use Notepad++. But I see the same behavior with Notepad++ as well. – user1720897 – 2015-12-10T09:32:19.723

OK, that's interesting. Np++ is definitively Unicode-aware. What locale do you have set? – Boldewyn – 2015-12-10T09:49:23.220

The System and Input Locale is set to en-us – user1720897 – 2015-12-10T10:11:25.647

OK... I'm curious what actually gets saved. May I ask you to get a hex dump of a file where this appears? Basically, open Notepad, type Alt+2295 once, then save file and in cmd.exe run it once through a bin2hex tool like the one from fileformat.info. Then we can see, if it's really (in any of its representations) or something else.

– Boldewyn – 2015-12-10T10:20:57.083

1

I had already done this. Used HxD to see what the binary representation looks like. When I type Alt+2295 it gets saved as 2248. (Note: I save the file with Unicode Big Endian encoding). However, when I type Alt+2248, the binary looks like 255A

– user1720897 – 2015-12-10T10:25:33.097

Thanks. That's indeed a strange problem. I'll head over to Twitter, maybe someone there has an idea. brb – Boldewyn – 2015-12-10T10:58:33.537

Neither Twitter nor a night's worth of sleep helped. But I'm still at it... – Boldewyn – 2015-12-11T08:56:50.597

By the way, Alt+2295 gives me a Cedilla, U+00B8. And apparently @Burgi has a similar problem. – Boldewyn – 2015-12-11T08:58:47.310

Found it! Wikipedia to the rescue! :-) (So, at least, works for me.) I'm still curious as of what Windows actually does, when that registry key is not set... – Boldewyn – 2015-12-11T09:11:20.080

I'd say it's not related to encoding or problems in Notepad, but related to decimal vs hexadecimal. – Arjan – 2015-12-11T09:19:27.087

@Arjan nope, that doesn't check out. 2295 has nothing to do with 2248, independent how often you drag it through hexdec or dechex. – Boldewyn – 2015-12-11T09:50:19.493

@Boldewyn Thanks. It worked. I was actually aware of the registry entry. But the guides (linked here and here I was following lacked an important detail which you found in the Wiki article. That EnableHexNumpad needs to be a new String Type. I was adding it as a new Key!

– user1720897 – 2015-12-11T10:44:43.520

But, Boldewyn, I think that "2295 has nothing to do with 2248" is related to Windows not expecting Unicode codes at all when using Alt+numpad (when the registry hack is not applied). I guess it's then expecting some Windows codes, and I'm sure it's expecting decimal, without the registry hack. I really do not see how Notepad would be the culprit. That second part of your answer is just a wild guess, which I feel is wrong. – Arjan – 2015-12-11T11:28:58.153

@Arjan the second part of my answer is below a fine line, where the first part reads load "Edit:" :-P If you read the comments, it became clear quite quick, that the first version of the answer is not correct. Anyway, I'll remove the old part. – Boldewyn – 2015-12-11T11:55:07.643

Thanks for removing the old parts; see also When is “EDIT”/“UPDATE” appropriate in a post? Cheers.

– Arjan – 2015-12-11T16:04:25.957

2

To answer the question of why this specific value is present:

With the standard input method, decimal numbers are taken mod 256 and then interpreted as the OEM code page* if there is no leading zero, or the ANSI code page if there is a leading zero. So, the steps are:

  • 2295 mod 256 = 247
  • 247 [0xF7] is U+2295 in the OEM code page

Character sets that have U+2295 at this potion are Codepages 437, 737, 770, 772, 774, 860, 861, 862, 863, 864, 865, CWI, and MIK.

(The fact that "2295" and "2248" both start with 22 is an interesting coincidence, nothing more)

* Note: "ANSI Code Page" has little to do with ANSI, except that code page 1252 was based on a draft of what later became ISO 8859-1 [and some of the others had similar origins]. It is the 8-bit character set associated with the current locale, and "OEM Code Page" is another character set associated with the locale, typically the one that was used in MS-DOS in that country.

Random832

Posted 2015-12-10T08:42:12.443

Reputation: 601