Questions tagged [unicode]

Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

Related tags

45 questions
39
votes
5 answers

How to make the 'less' command handle UTF-8?

On my Mac terminal, printing UTF-8 works in general, but the less doesn't work correctly. So this works correctly: $ echo -e '\xe2\x82\xac' € but piping it into less gives something like this: $ echo -e '\xe2\x82\xac' | less …
user9474
  • 2,368
  • 2
  • 24
  • 26
13
votes
2 answers

Is there a MySQL performance benchmark to measure the impact of utf8_unicode_ci versus utf8_general_ci?

I read here and there that using the utf8_unicode_ci collation ensures a better treatment of unicode text (for example, it knowns how to expand characters such as 'œ' into 'oe' for searching and ordering) compared to the default utf8_general_ci…
MiniQuark
  • 3,695
  • 2
  • 20
  • 23
10
votes
2 answers

How to add non-latin entries in hosts file

Is there a way to add non-latin entries in /etc/hosts on windows? Something like 127.0.0.1 локалхост Tried the code above and also punycode with no luck Yes, i know that this would break almost any app and wouldn't pass any validation. I only…
Drath Vedro
  • 339
  • 1
  • 6
10
votes
1 answer

Can PuTTY be configured to display the following UTF-8 characters?

I'd like to be able to render the characters as seen in this tweet: I saved the tweet's JSON data and wrote a one-liner python script for testing. python -c 'import json,urllib; print…
sente
  • 263
  • 1
  • 2
  • 10
10
votes
4 answers

How can I check if PHP was compiled with the UNICODE version of the Win32 API?

This is related to this Stack Overflow post: glob() can't find file names with multibyte characters on Windows? I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case: print_r(scandir('./uploads/'));…
Wesley Murch
  • 155
  • 2
  • 12
8
votes
1 answer

Cisco FWSM -> ASA upgrade broke our mail server

We send mail with unicode asian characters to our mail server on the other side of our WAN... immediately after upgrading from a FWSM running 2.3(2) to an ASA5550 running 8.2(5), we saw failures on mail jobs that contained unicode and other text…
Mike Pennington
  • 8,266
  • 9
  • 41
  • 86
8
votes
5 answers

Best way to make sure a MySQL database is fully in UTF8

After some problems with UTF8 and none-UTF8 strings, we're standardising on UTF8. One thing I need to do is check that everything is in UTF8 in the MySQL database? What do I need to check? Server default characterset Default character set of each…
Amandasaurus
  • 30,211
  • 62
  • 184
  • 246
5
votes
1 answer

Special characters in ssh usernames

I've got a few users configured on LDAP, and would like them to be able to connect to a Linux machine via SSH using those users and password on the LDAP directory. However, there are 2 issues I do not know how to handle: The usernames and passwords…
sshuser
5
votes
1 answer

Powershell 2: How to strip a specific character from a body of ASCII text

I am trying to strip odd characters from strings using PowerShell. I used the following output to attempt to learn on my own: get-help about_regular_expressions I am trying to take a string that is mostly ASCII, but that has one anomalous character…
Larold
  • 802
  • 4
  • 13
  • 21
4
votes
0 answers

Nginx doesn't resolve files with special characters

My wordpress has a couple of images uploaded that have special characters in their file names. For example: wp-content/uploads/2015/06/cambios-antes-y-después-de-hacer-ejercicio.jpg I see in my access.log that nginx is looking for the file like…
Snowball
  • 181
  • 1
  • 8
3
votes
1 answer

How to view a remote unicode file via ssh?

I have a unicode file that contains Chinese characters. I have a local and a remote copy of it. When I use less on the local file the characters are shown properly: 奥尔德林 However, when I ssh to the remote machine and look at the remote version of the…
user9474
  • 2,368
  • 2
  • 24
  • 26
3
votes
1 answer

How can I get active directory users and computers to display unicode characters in user names?

In a test environment, we are seeing how you can manage Japanese usernames and english usernames on a Windows 2012 R2 domain controller in Windows 2012 domain. AdsiEdit.msc displays Japanese usernames correctly but Active Directory Users and…
simon
  • 714
  • 7
  • 20
3
votes
1 answer

using curl against my IDN doesn't work right; browsers are OK

I've registered www.❺➠.ws, which goes to the same IP as www.naildrivin5.com. curl www.❺➠.ws returns the homepage of www.naildrivin5.com. No problem. I modify apache to use name-based virtual hosts as follows:
davetron5000
  • 173
  • 4
3
votes
3 answers

UnicodeEncodeError when uploading files in Django admin

Note: I asked this question on StackOverflow, but I realize this might be a more proper place to ask this kind of question. I'm trying to upload a file called 'Testaråäö.txt' via the Django admin app. I'm running Django 1.3.1 with Gunicorn 0.13.4…
Samuel Linde
  • 51
  • 1
  • 4
3
votes
1 answer

Linux support for unicode filenames

I have a couple Linux fileserver running Samba, what do I need to do to support filenames with unicode characters? Do particular filesystem have better support for Unicode? Would I get better support by using something other then ext3? What do I…
Zoredache
  • 128,755
  • 40
  • 271
  • 413
1
2 3