password complexity policy for non "English" passwords

Question

In an internationalized application, what is the best practice for a policy on complexity of passwords? I am not having luck searching for the answer. Wikipedia lists these items for password policy:

the use of both upper- and lower-case letters (case sensitivity)

inclusion of one or more numerical digits

inclusion of special characters, e.g. @, #, $ etc.

prohibition of words found in a dictionary or the user's personal information

prohibition of passwords that match the format of calendar dates, license plate numbers, telephone numbers, or other common numbers

prohibition of use of company name or an abbreviation

If I am a non Latin based language speaker, how do rules about upper and lower case work? It's not as simple as a-z and A-Z. What about dictionary lookups and prohibiting certain formats? This problem seems well know for English, but what about other languages and cultures?

Excellent question. I'd never considered this before, beyond just mentally assuming that "nobody cracks cyrillic / arabic characters". — Polynomial, Jul 16 '12 at 19:38
I live in an arabic country and a majority of the times the passwords are in english, while all the reset of the software or system is in arabic. — george_h, Jul 17 '12 at 11:55
@george_h The passwords are really in English, and not just latinized Arabic words? — msanford, Jul 25 '12 at 17:40
AFAIK, non-ASCII password encoding is more complicated than it sound. I.e. combine the variance on input encoding and the operating system language encoding, sometime it wreck havoc to the user. In addition, some password encoding algorithm cannot address unicode character. — mootmoot, May 31 '19 at 11:35

score 6 · Answer 1 · answered Jul 16 '12 at 19:52

Most drive-by password cracking attempts are going to assume that the password is a subset of ASCII characters. However, targeted attacks (under the Advanced Persistent Threat model) are likely to discover that your users aren't using ASCII passwords and change tact.

As such, I suggest the following rules:

Identify which languages are likely to be in use, based on your user base.
Perform dictionary lookups for those languages, and deny passwords based on them.
Ask native speakers to suggest common non-dictionary words and phrases that might be used as passwords.

If you can identify character subsets for each alphabet, you can apply individual entropy scores to them. This allows you to require a minimum security for such a password.

Probably the most important part is user feedback. Ask your users to report problems they find with the password system on your site, and suggest ways to improve the system. Only a native speaker can really identify weak passwords ahead of time.

score 6 · Answer 2 · answered Jul 25 '12 at 17:48

6

All joking aside, xkcd's Password Strength comic may have particular relevance here since it's language-independent: верный лошадь батарейка штапель (example below in Russian) is full of entropy.

xkcd comic 936

answered Jul 25 '12 at 17:48

msanford

819
1
9
26

score 2 · Answer 3 · answered Jul 25 '12 at 17:16

2

I did find a Microsoft TechNet article, Passwords must meet complexity requirements, that seems to apply to their Windows Server based products. That adds another character group for "letters" that are neither upper nor lower case.

Passwords must contain characters from three of the following five categories:

Uppercase characters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)

Lowercase characters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)

Base 10 digits (0 through 9)

Nonalphanumeric characters: ~!@#$%^&*_-+=`|(){}[]:;"'<>,.?/

Any Unicode character that is categorized as an alphabetic character but is not uppercase or lowercase. This includes Unicode characters from Asian languages.

answered Jul 25 '12 at 17:16

Kevin Hakanson

491
1
5
13

"Passwords must contain characters from three of the following five categories" how is that useful? – curiousguy Jul 25 '12 at 17:49
@curiousguy uhh.. because it tells you what it has to have to be a password? If you don't meet the requirements, you can't set it. – cutrightjm Jul 26 '12 at 14:31
Yes, of course. But what is **the point** of the rule? Why was the result invented? What makes the rule useful? – curiousguy Jul 26 '12 at 14:40
I wonder what C# test is used for the last category? I'm implementing password rules, as per a clients request, but they deal with several languages include Arabic, Korean, Chinese, Japanese and Cyrillic, for which some of these rules are difficult to apply (or I need to get samples of passwords in those languages, and run tests against them to make sure they pass and fail appropriately). – Reuben Aug 12 '14 at 04:20

score 0 · Answer 4 · answered May 31 '19 at 09:55

I am dealing with a system to be deployed in Arabic speaking countries. As I understand it there is no concept of Upper and Lower case in Arabic, just 28 alpha characters.

Our current minimum password complexity requirement is 1 Uppercase, 1 lower case, 1 numeric, 8 characters. This gives (26*2+10)^8 = 2.2E+14

I offered the team 2 choices

For Arabic characters, and require a password that must contain at least one, alphabetic (Arabic Unicode), 1 number, 1 special character form the set commonly on these keyboards !"#$%^&*()_-+=:;@/<>,. (20) This gives (28+10+20)^8 = 1.3E14
Have a password with a minimum length of 10 characters, and must contain at least 1 alphabetic, and 1 numeric character. This gives (28+10)^10 = 6.3E15

The possible characters are shown here: https://en.wikipedia.org/wiki/Arabic_alphabet#Keyboards

score 0 · Answer 5 · answered Jan 02 '15 at 19:30

Entropy analysis assumes that the password is contained of a patterned set of words or phrases derived from a known language. What if passwords are not composed of phrases found in any language? In other words the frequency of certain letters occurring in a typed page of text can be calculated and in latin and english based systems, this would correspond to ETAOINSHRDLU being the most frequent (thanks Carl, from his book Contact). So passwords made up of non-language based strings would be the hardest to crack (and consequently the hardest to remember). There are many ways formulate people friendly passwords without dipping into language as a source for your password strings. Some letters and numbers rhyme phonetically and entropy analysis does not take that into consideration (unless I missed something on the section on soundex libraries). Ironic that the very means we need to protect passwords from entropy and statistical analysis are also the ones that make passwords the least easy to remember. Password policies are designed to ensure a minimum level of protection despite what language the password is created in.

The question of policies and what sorts of characters from other non-latin languages is a great question for those of us who work primarily in latin based character sets. For instance, are there characters similar to the latin languages for other languages such as Thai or Arabic? What about symbol and object based languages like Chinese as inflections which are a common factor in english language in delineating the difference between a statement and a question. In Chinese, inflection is called pinyins, and pinyins actually change the definition of a word rather than delineating the difference between a statement or a question. So are there special characters in those languages that can be used like the special characters in latin based languages? Thats what it sounds like the question was getting at.

password complexity policy for non "English" passwords

5 Answers5

Linked