20

Using emails as usernames for webapps is a convenient way to avoid the "yet another online username" problem. As such, by using this approach, the emails should be easily available in the backend to do user/pass checks.

However, in the context of GDPR, and since emails are considered personal information, this data should be protected while on the database or other storage medium.

It would be wonderful to have your opinion on the following approach to handle it with pseudonymization:

  1. Store a pseudonym (hash) of the email instead of plaintext email;
  2. Every time a login is attempted, search for the hash of the email and do the credential checks;
  3. When there is the need to really get email to display in the frontend or other usage, keep a "pseudonym table" with a key/value structure, where the key is the hash and the value is the encrypted value of the email. This plain-text column could be ciphered with any available column-encryption strategies available on most relational DBs;
  4. The password to decrypt the column would be used in memory to decrypt the column values but the data would be stored in an encrypted form;
  5. Do this for all personal data that the webapp needs to store;

What do you guys think of this approach?

Do you think this will have a big performance penalty, even with an indexed key column?

Is there any other simple approach to still offer the possibility to handle email as usernames but still comply with GDPR?

Limit
  • 3,191
  • 1
  • 16
  • 35
João Dias Amaro
  • 303
  • 1
  • 2
  • 5
  • 4
    Compliance does not necessarily mean record level encryption. As long as you keep it in a restricted access database, you are good. And you would restrict access to your credentials database anyway. So nothing new under GDPR. – Geir Emblemsvag Apr 25 '18 at 04:09
  • I Geir, thank you for your comment. Actually I thought that according to GDPR we should do everything possible to protect data at rest. Hence the encryption of sensitive personal data columns. Every case is a case and with GDPR sometimes I find it hard to understand how to proceed. Cheers. – João Dias Amaro Apr 25 '18 at 08:37

3 Answers3

8

IANAL, but GDPR does not make encryption mandatory for personal data. Read this article to understand the complexity around encryption and GDPR better.

In the GDPR encryption is explicitly mentioned as one of the security and personal data protection measures in a few Articles. Although under the GDPR encryption is not mandatory, it is certainly important to see where and why encryption is advised.

...

GDPR encryption: the what you should know part Before doing so let’s be clear: GDPR compliance, as we wrote before is a business strategy challenge and encrypting personal data STRICTLY SPEAKING is not mandatory.

Read more at https://www.i-scoop.eu/gdpr-encryption/

Preferably you do a Privacy Impact Assessment. Afterwards make a decision how you will handle the personal data. If you conclude that encryption of email usernames is a good decision, do it. For example if your web-application is called peoplewhoarechristians.com the usernames would be classified as sensitive data because it creates a relation between the user and their religion. But how sensitive the personal data is will depend by case basis and so will the actions to mitigate the risks. Also the law talks about a "implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk", what appropriate is will differ per case.

Your approach feels good for sensitive personal data. I think it will be overkill for low risk personal data. Still I would document your decision. Email addresses are currently not sensitive data by default. Read this article about the difference between sensitive and non-sensitive.

We could argue that giving an email address as username means that the user gives consent for processing and aware that it could leak in combination with your app-name. But better safe than sorry and therefore I would ask clear consent for the usage. In this example the purpose of personal data collection and processing is authentication and probably access control, but don't forget analytics and things like error logging. If you plan to also use the email for marketing purposes be sure to gather extra consent.

  • Thank you so much for the really clear explanations. Makes sense. I was afraid that web applications would need to start doing really weird things just to go to the extra mile in terms of protecting personal data. Thanks. – João Dias Amaro Apr 25 '18 at 13:32
  • in the article you link to they say "Hashed email addresses" aren't sensitive data, that means that non-hashed emails are, and the law also say any data there can be used to track back to the person is sensitive data. But else nice answer and good sources :) – TheCrazyProfessor Aug 23 '19 at 21:57
0

A term used frequently in the GDPR legislation is that you must take "appropriate technical and organisational measures ... according to risk". Risk in this case refers to the risk posed to the "Data Subject" (person).

So what is appropriate would vary greatly from a gaming platform to a pharmacy (as linking sales to identifiable persons could reveal information about terminal medical conditions).

GDPR does not set any technical requirements, but let it be up to you to determine what is appropriate.

Pete
  • 181
  • 3
-1

I would recommend a different approach. The GDPR, among the stuff, requires certain sensitive information to be stored in separate servers than operational data. This way, JOINs will be difficult or impossible without proper permissions. This is especially true in health and financial systems.

In your case, your problem is that the email is considered a personal data. Without discussing if you are overkill or not, I can suggest you a pseudonymization approach that helps you go to the step above with ease than redesigning and migrating the entire database.

You may want to associate every user a primary identifier that is agnostic to the email address or username, e.g. a database auto increment key or a GUID. Then, everything related to that user can be referenced by using that key.

Now, you can easily store login credentials in a separate table. In the future, such table can be moved to a different database if required by the business.

CREATE TABLE users(
    userid VARCHAR(64) PRIMARY KEY, --uuid generated
    language VARCHAR(2) NULL,
    theme VARCHAR(30) NULL,
    ....
    ......  -- any information that is not strictly protected by GDPR
);

CREATE TABLE accounts(
    userid VARCHAR(64) NOT NULL PRIMARY KEY,
    email VARCHAR(256) NULL,
    password_hash VARCHAR(64) NULL,
    failed_logins INT NOT NULL DEFAULT 0,
    last_login DATETIME NULL,
    -----
);

When a user asks for removal of personal information, or when the data retention clock ticks, the agnostic ID helps you keep non-personal information in other tables (e.g. historical travel list now becoming anonymous).

You can move the above approach forward to the next level.

Consider the accounts table. If you want to allow users to delete their own account but keep record of the uniqueness of the email address (e.g. to prevent people from resetting their history), you may nullify the email address but keep an additional column with a hash of the email address, so that while the email address is nullified the hash is not.

After that, when a user tries to sign up with an older email address you have evidence that the user had deleted his account: you are able to decide what to do.

usr-local-ΕΨΗΕΛΩΝ
  • 5,310
  • 2
  • 17
  • 35
  • `Now, you can easily store login credentials in a separate table. In the future, such table can be moved to a different database if required by the business.` - If you do this, you should adopt OAuth/OpenConnect (eg, "log in with google/facebook"), which is essentially exactly this. Note that just because the email address/login (even verified) is the same, does **not** mean that the _person_ is the same, which might be another issue. – Clockwork-Muse Nov 26 '18 at 21:38
  • Well, yes, OP may prefer to retain unique personal identification in a hashed format (e.g. tax code, SSN) to check whether someone is attempting to reset – usr-local-ΕΨΗΕΛΩΝ Nov 27 '18 at 08:05