1

After certain privacy concerns Facebook rolled out a bunch of changes to their APIs, one being "scoping" user identifiers per app. In effect, this means that Alice's canonical facebook user ID is never shared through facebook APIs, but each app sees a unique never-changing value.

To formalize this through two functions:

mask(user_id, app_id) unmask(masked_id, app_id)

These functions should satisfy some properties:

  1. reversible: user_id == unmask(mask(user_id, app_id), app_id)

  2. secret: given y = mask(user_id, app_id) it should be 'hard' to obtain the original user_id.

  3. deterministic: mask(user_id, app_id) == mask(user_id, app_id), where user_id == user_id and app_id == app_id

  4. collision resistant: for all user ids, given the same app_id, it should not be the case that mask(user_id1, app_id) == mask(user_id2, app_id)

After a lot of reading, I have come up with two possible implementations:

Stateful: store a triple (user_id, app_id, hash(user_id, app_id)).

Stateless: for each app_id generate & store a secret, and encrypt(user_id, secret) on the fly. the encryption should be a deterministic cypher, such as AES-SIV.

Both of these approaches have fairly significant downsides. The stateful approach infers an incredible persistence cost -

Is there some other approach I might be missing here? I think this is a deceptively easy problem, looking forward to get some insight.

goralph
  • 111
  • 2

1 Answers1

1

What if they have a symmetric key for a 64bit block cipher stored per App, that they do not reveal to the App developers, stored right next to the App Id in their system, and they encrypt the real user id with this per-app key? This has no storage cost (the encrypted user id does not need to be stored) and processing cost is just one cipher block operation per user id in API serializer/deserializer (e.g. if it's an API request that returns a list of 100 users who have liked a page or whatever, then 100 cipher block operations are needed and they can be done in parallel).

I do not know if this is what they do, but would this satisfy your requirements?

We know the per-app user id is 64 bits (I think I remember documentation stating 63 bits would not be enough). So the block cipher must have 64 bit blocks. So it's not AES because AES has 128 bit blocks, and I don't think they would use 3DES because it's slow. It might be DES. There are plenty of block ciphers to choose from: Blowfish, CAST-128, TEA, IDEA, RC2, RC5, etc.

Z.T.
  • 7,768
  • 1
  • 20
  • 35