I am currently designing a login for a web service. I will use a PBKDF2 implementation for hashing the passwords.
However, I intend to allow unicode for passwords, as I will have international users, which might want to use, for example, cyrillic characters. To avoid any issues with unicode ambiguity, I thought of using the NFC unicode normalization before encoding the password as utf8 and passing it on to the hash.
The question now is: Is that safe or does it introduce any unwanted ambiguity into the password validation? It is clear that "a\u0308" (a + combining diaresis) and "ä" should be the same, but does NFC fold any more differences which users could be relying on?
Edit:
I found that there is a stringprep (RFC 3454) profile called SASLprep (RFC 4013) which is appearantly used for passwords and usernames in some protocols. It specifies to use a KD normalization, which I consider a bad idea. It will fold differences like ² and 2, which are two characters commonly on keybords in the western world at least, which could be used to enrich the password entropy. Unfortunately, no rationale is given for that.