Questions tagged [unicode]

Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

Related tags

utf-8

45 questions

votes

5 answers

How to make the 'less' command handle UTF-8?

On my Mac terminal, printing UTF-8 works in general, but the less doesn't work correctly. So this works correctly: $ echo -e '\xe2\x82\xac' € but piping it into less gives something like this: $ echo -e '\xe2\x82\xac' | less …

asked Aug 06 '12 at 16:49

user9474

2,368
2
24
26

votes

2 answers

Is there a MySQL performance benchmark to measure the impact of utf8_unicode_ci versus utf8_general_ci?

I read here and there that using the utf8_unicode_ci collation ensures a better treatment of unicode text (for example, it knowns how to expand characters such as 'œ' into 'oe' for searching and ordering) compared to the default utf8_general_ci…

mysql database-performance sql unicode utf-8

asked Jul 05 '10 at 10:05

MiniQuark

3,695
2
20
23

votes

2 answers

How to add non-latin entries in hosts file

Is there a way to add non-latin entries in /etc/hosts on windows? Something like 127.0.0.1 локалхост Tried the code above and also punycode with no luck Yes, i know that this would break almost any app and wouldn't pass any validation. I only…

windows hosts unicode

asked Oct 23 '16 at 20:51

Drath Vedro

votes

1 answer

Can PuTTY be configured to display the following UTF-8 characters?

I'd like to be able to render the characters as seen in this tweet: I saved the tweet's JSON data and wrote a one-liner python script for testing. python -c 'import json,urllib; print…

putty encoding utf-8 unicode

asked Apr 11 '12 at 04:35

sente

votes

4 answers

How can I check if PHP was compiled with the UNICODE version of the Win32 API?

This is related to this Stack Overflow post: glob() can't find file names with multibyte characters on Windows? I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case: print_r(scandir('./uploads/'));…

apache-2.2 windows php unicode

asked Mar 30 '12 at 20:23

Wesley Murch

votes

1 answer

Cisco FWSM -> ASA upgrade broke our mail server

We send mail with unicode asian characters to our mail server on the other side of our WAN... immediately after upgrading from a FWSM running 2.3(2) to an ASA5550 running 8.2(5), we saw failures on mail jobs that contained unicode and other text…

firewall cisco smtp unicode

asked Nov 09 '12 at 05:21

Mike Pennington

8,266
9
41
86

votes

5 answers

Best way to make sure a MySQL database is fully in UTF8

After some problems with UTF8 and none-UTF8 strings, we're standardising on UTF8. One thing I need to do is check that everything is in UTF8 in the MySQL database? What do I need to check? Server default characterset Default character set of each…

mysql database unicode utf-8 charset

asked Jun 26 '09 at 14:57

Amandasaurus

30,211
62
184
246

votes

1 answer

Special characters in ssh usernames

I've got a few users configured on LDAP, and would like them to be able to connect to a Linux machine via SSH using those users and password on the LDAP directory. However, there are 2 issues I do not know how to handle: The usernames and passwords…

ssh ldap unicode

asked Jan 04 '10 at 11:29

sshuser

votes

1 answer

Powershell 2: How to strip a specific character from a body of ASCII text

I am trying to strip odd characters from strings using PowerShell. I used the following output to attempt to learn on my own: get-help about_regular_expressions I am trying to take a string that is mostly ASCII, but that has one anomalous character…

powershell regular-expressions unicode ascii

asked Sep 21 '11 at 00:02

Larold

votes

0 answers

Nginx doesn't resolve files with special characters

My wordpress has a couple of images uploaded that have special characters in their file names. For example: wp-content/uploads/2015/06/cambios-antes-y-después-de-hacer-ejercicio.jpg I see in my access.log that nginx is looking for the file like…

nginx unicode

asked Dec 08 '17 at 10:17

Snowball

votes

1 answer

How to view a remote unicode file via ssh?

I have a unicode file that contains Chinese characters. I have a local and a remote copy of it. When I use less on the local file the characters are shown properly: 奥尔德林 However, when I ssh to the remote machine and look at the remote version of the…

ssh terminal unicode

asked Nov 20 '09 at 20:13

user9474

2,368
2
24
26

votes

1 answer

How can I get active directory users and computers to display unicode characters in user names?

In a test environment, we are seeing how you can manage Japanese usernames and english usernames on a Windows 2012 R2 domain controller in Windows 2012 domain. AdsiEdit.msc displays Japanese usernames correctly but Active Directory Users and…

active-directory windows-server-2012-r2 unicode

asked Oct 06 '15 at 10:29

simon

votes

1 answer

using curl against my IDN doesn't work right; browsers are OK

I've registered www.❺➠.ws, which goes to the same IP as www.naildrivin5.com. curl www.❺➠.ws returns the homepage of www.naildrivin5.com. No problem. I modify apache to use name-based virtual hosts as follows:

apache-2.2 curl unicode idn

asked Sep 20 '09 at 23:19

davetron5000

votes

3 answers

UnicodeEncodeError when uploading files in Django admin

Note: I asked this question on StackOverflow, but I realize this might be a more proper place to ask this kind of question. I'm trying to upload a file called 'Testaråäö.txt' via the Django admin app. I'm running Django 1.3.1 with Gunicorn 0.13.4…

nginx django unicode gunicorn

asked Feb 14 '12 at 11:56

Samuel Linde

votes

1 answer

Linux support for unicode filenames

I have a couple Linux fileserver running Samba, what do I need to do to support filenames with unicode characters? Do particular filesystem have better support for Unicode? Would I get better support by using something other then ext3? What do I…

linux filesystems samba file-sharing unicode

asked Jun 17 '09 at 06:42

Zoredache

128,755
40
271
413

2 3 Next