Accessing document using a 6 letter token

Question

We are building a web app where users can insert 6 letters/digits (A_Z, 0-9) into a form to access a document. Every document has a randomly assigned access code like this: 1ABH5F.

When the user inserts a valid code, the document is shown. There is no user login (no authentication) - it is open to the public.

The front end will access the document via a stateless API - the code will be sent to the API, which will return the document.

How should we implement information security? Nobody here is a security expert, but we were thinking like this:

Using a captcha on the front end
Limiting calling the API from a single IP more than 3 times/hour

What other things do we have to implement to prevent access to the documents?

I guess it is very important to specify the use case for this system:

It is a system for anybody holding this document to see the original (digital) document. It will be used in an environment where users can print the documents (for example: car dealer) and bring it to other companies (for example: car registration office).

The problem here is, that users can (and DO!) falsify the printed documents, then bring them to other companies (car registration office). These other companies have to have a way to check if the original is the same as the printed version.

Since we do not know which the other companies are (any of 10000+ car registration offices), any person holding/viewing the 6 letter token can access to the original document.

Your token should have about 72 bits of entropy. If you are limited to uppercase letters and numbers that is 14 characters. This helps future-proof your system to allow additional cryptographic security features down the road. — 700 Software, Sep 26 '16 at 13:36
From a user-experience standpoint, you should probably avoid the numbers "0", "1", and "5", and the letters "O", "I", and "S" (and perhaps "Q"). This reduces your alphabet to 30 characters from the original 36. — Mark, Sep 26 '16 at 19:48
Something you don't make clear: do these documents hold any information that should be kept private? What are the consequences of an "unauthorized" person accessing the doc? This is important to determine how much protection you actually need. If falsification is the *only* concern, then why does it need to be "protected" at all? Side note: "it is open to the public", "to prevent access to the documents" These two statements express exactly opposite concerns. — jpmc26, Sep 26 '16 at 20:19
Please read our [help/on-topic]: "Security is a very contextual topic: threats that are deemed important in your environment may be inconsequential in somebody else's, and vice versa. Are you trying to protect something of global value against Advanced Persistent Threats? Or are you looking for a cost-effective approach for a low-profile small business? [..] you should tell us: [...] who uses the asset you're trying to protect, and who you think might want to abuse it (and why) what steps you've already taken to protect that asset what risks you think you still need to mitigate" — D.W., Sep 26 '16 at 22:00
Can you make the code longer? You could add a QR code to the printable doc with the full URL so it's still usable. — mgarciaisaia, Sep 26 '16 at 23:08

Anders · Accepted Answer · 2016-09-27T09:07:27.587

Brute forcing

So you have an alphabet of size 36 and 6 characters. That gives you about two billion different tokens. Lets say you have a thousand different documents. That gives you a chanse of one in two million of guessing a token associated with a document. Trying from a thousand different IP:s every hour for a year would give you almost ten million guesses - that should give you a couple of documents.

Sure, the CAPTCHA makes this harder. But they are not perfect, and they can always be cracked by humans.

The problem here is that since you only enter a token and no document ID you can only rate limit on IP and not on document. That makes it very hard to protect against brute forcing unless you have a very large space to pick tokens from.

Sharing

A password is personal and you are encouraged not to share it. That means it can be easily changed if it is compromised, and you have some control over who gets their hands on it.

A document token like this is supposed to be shared by design. You have very little control over who gets it. It will end up on mailservers and backups and post its on peoples desktops all over the world.

You have no idea who has access to the token, and if you need to change it you will need to redistribute it to all the persons who are supposed to have it. That is neither secure not practical.

Conclusion: There must be a better way

This will not give you very good security. If the resource you are protecting is not very important it might be enough, but I would not use it for anything of value.

I do not know your exact use case, but whatever it is there must be a better way to solve this problem than rolling your own API. Using an existing solution would also save you the problem of having to write your own code.

Use an existing cloud storage service, a VPN connection into the company intranet, or something else. Just don't fire up your IDE and start coding away.

Update: Your use case

This is one of the cases where an access token is probably a good idea. But to get around the problems mentioned above I would do this:

Keep both the CAPTCHA and the rate limit by IP. (You might want to reconsider how the rate limiting is done in order to prevent accidental or deliberate DOS.)
To deal with the brute forcing, I would increase the size of the token. Google Drive uses 49 characters with both upper case letters, lower case letters and numbers. That should be enough for you as well.
To get around the sharing issue, print the URL with the token in a QR-code on the document itself. This brings the hole problem into the domains of physical papers that peoplpe are used to dealing with. The people who see the paper will have access to the digitial original. That is easy to grasp.
Consider setting a limit on how many times the document can be accessed, or at least a maximum time for how long the token can be used. If the car should be registered within one week, there is no reason for the token to work after two.
Do not store the tokens in plain text in your database. Hash them. (Something fast like SHA256 should be enough here - no need to roll out bcrypt when you have large random tokens.)
Use a CSPRNG to generate the tokens, otherwise they could be guessed by an attacker having access to a few tokens.

Hey, seems like a really good answer, but i'm wondering. What is the downside to using bcrypt in this case? Isn't the speed difference minimal in the case of just checking one hash? — Jester, Sep 26 '16 at 12:00
There isn't that much of a down side, except the performance impact that bcrypt is designed to have. Sure, it is only one hash *per request*, but it is still something. — Anders, Sep 26 '16 at 12:18
With the QR approach, you must teach all the users to be on the lookout for the URL being on a different domain. A better approach might be a barcode which they can directly scan into the formfield. Downside: barcodes can contain control characters, which might also be an attack vector (^L, http://fake-site.bla/?code=ABC123). But that string will also be longer which makes it more noticable. — Sjoerd Job Postmus, Sep 26 '16 at 12:40
"a year would give you almost ten million guesses" -- and that's even assuming that three accesses per IP per hour survives contact with the enemy. I suspect in practice it would be unusable. For example a car dealership might have 10 sales staff sharing an IP address, then if there's ever an hour in which they sell 4 cars they can't access the documentation. Or a user might have difficulty printing and end up hitting "refresh" a couple of times. — Steve Jessop, Sep 26 '16 at 14:41
@Anders: even then, at the end where the documents are checked I wouldn't want a job where I'm only allowed 3 typos an hour, shared among the whole office ;-) Full-time data entry folks are better typists than me, ofc, but it introduces quite a high level of confidence you need to have that you've transcribed the serial number correctly before you hit the button to load the document. And a potential DoS attack where someone gives an incorrect serial number to a clerk and thereby shuts down the office for an hour. — Steve Jessop, Sep 26 '16 at 14:45
It is three failed requests.. but we recognize this problem with shared IPs in one office.. So we are thinking introducing whitelist for known IPs (so any car dealer can apply) and maybe lifting failed request limit. In any case, blocked IP can send a message to customer service... — Peter, Sep 27 '16 at 07:00
@Peter Instead of spending time making it easy for customer service to "fix" the lockout problem(!), why don't you spend the time making the solution good enough so it's not needed? "They can always perform [some kind of hassle of a workaround]" is lazy-talk. CGNAT also means the dealerships might conceivably share an ip address with the café (or whatever) next door offering free wifi, so I don't think that's a good idea either. That essentially boils down to authenticating based on IP, which isn't great. https://security.stackexchange.com/questions/4533/ — Simon Lindgren, Sep 27 '16 at 08:29
@Simon Thank you for your input.. I see my thinking is not ok.. Maybe to formulate better: How to provide best security with only six letter token to access the documents? — Peter, Sep 27 '16 at 14:32

Dmitry Grigoryev · Answer 2 · 2016-09-26T10:32:19.857

Since you say "any person holding viewing the 6 letter token can access to the original document", I assume there is nothing really secret in these (i.e. I couldn't commit fraud by simply finding a token in my neighbour's trash). Otherwise pick a regular authentication scheme with e-mail registration and passwords.

Many token-based systems are used the way you describe, though in your case the token length is strikingly small. I suggest you use tokens at least twice as long: this would make brute-force attacks impractical without making the system much harder to use.

PS. Oh, and please exclude letters O and I from your alphabet if you haven't already.

Navin · Answer 3 · 2016-09-28T13:32:33.553

3

Limiting calling the API from a single IP more than 3 times/hour

First things first, this is a huge denial-of-service risk. Getting locked out for an hour just because someone mixed up "l" and "1" is unacceptable.

Keep in mind that pretty much all office computers are behind NAT44. There are several users behind each IP. With CGNAT (NAT444) and NAT464, you'll also see many home users in different buildings using the same IP.

edited Sep 28 '16 at 13:32

answered Sep 27 '16 at 01:38

Navin

467
5
9

It is three failed requests.. but we recognize this problem with shared IPs in one office.. So we are thinking introducing whitelist for known IPs (so any car dealer can apply) and maybe lifting failed request limit. In any case, blocked IP can send a message to customer service... – Peter Sep 27 '16 at 07:01

Accessing document using a 6 letter token

3 Answers3

Brute forcing

Sharing

Conclusion: There must be a better way

Update: Your use case