2

I am building a single-page(React-Redux on FE, Rails-API BE) application which makes a bunch of REST API calls to get certain information for logged-in users. Some subset of that information (categorization database) is more confidential than other information. Our aim is, to make it as hard as possible for that data to be easily gleaned from the API response or be scraped.

Based on my research, I understand that there is no method which makes it 100% possible to secure the data but I would like to make it as hard and frustrating as possible without negatively affecting the UX. We are also more concerned about the fact the information is essentially plain JSON when it comes from the server to the client which is far easier to exploit than scraping (since the UI in this case isn't the easiest to scrape).

So far, based on my research, it seems like I should do the following -

  1. Use TLS/SSL for API calls that prevents man-in-the-middle attacks, and encrypts data in-transit. But this doesn't solve the problem of a malicious end-user.

  2. To do that, we are going to do some rate-limiting on the API, and we can also temporarily suspend users/scrapers who are show up as 'bots' in the logs (since they are all logged-in). But that can be defeated just by malicious users appearing as more patient users.

  3. The best thing, it seems that I could obfuscate the API response in two ways:

    • Obfuscate response keys or use which doesn't give away data as easily.
    • Send encrypted response from the server which javascript decrypts using a key which itself might be obfuscated in the code. This way, the data wouldn't be visible in the Developer Tools or to anyone else making the API call.

So, my question, is the last point above (the second point in (3) encrypting responses) an effective way to achieve my goal? And if yes, what are some things I should keep in mind for the same?

While I understand that none of these methods are foolproof, I would like to better understand the best practices for such use-cases. Any thoughts or ideas would be much appreciated!

Thank you!

geoboy
  • 133
  • 1
  • 4
  • 1
    There isn't really a way to "solve" this perceived problem, so asking the "best" solution seems irrelevant. – Alexander O'Mara Sep 06 '16 at 19:18
  • @AlexanderO'Mara I have reworded the question to reflect your point. And I understand and acknowledge that there is no 100% foolproof solution. Hence, the focus on best practices or effective ways to achieve the goal. Any ideas would be much appreciated! – geoboy Sep 06 '16 at 19:22
  • What problem occurs if the information is scraped? A user will share with others, or go into competition with you, or something else? How do those costs manifest? The point is- threats and risks here should be more explicitly laid out. Solutions in the vein above seldom meet expectations. – Jonah Benton Sep 07 '16 at 03:51
  • @JonahB I take your point, and that has been well considered. It was too much to elaborate in the question itself (it was wordy enough as it is). Assuming that it is worth the effort, any thoughts on how best to go about it? – geoboy Sep 21 '16 at 02:00

4 Answers4

4

Your API should check to see who is making the call and perform any sort of restrictions on the server side.

The way to deal with this is to implement server-side authorization and mitigate against insecure object reference and function level access control. These should be standard anyway.

If your API returns information you don't want the caller to be able to see, you implemented the API wrong. There is nothing you can do to protect the data at that point as the end user can simply monitor the network traffic using something like Chrome console. Even if you encrypt it, the code to decrypt it lives in the end user's browser and can be reverse engineered. I suppose you could withhold the key from everyone except authorized users, but what is the point in using up bandwidth by returning data that can't be read? Instead of encrypting it, you may as well redact it, e.g. replace it with completely random gibberish, or replace it with empty strings. Do it server side.

John Wu
  • 9,101
  • 1
  • 28
  • 39
  • Thanks for the response. I do have all of that you have mentioned. And as I acknowledged in the question, I am aware that encrypting it isn't fully securing it. But my aim is to make it as hard as possible, and encryption seems like a viable method to do it. Redacting it seems error prone with application though I will obfuscate the keys a bit. Any thoughts on how to best implement encryption in this scenario? – geoboy Sep 21 '16 at 01:59
4

A web browser is not a DRM platform, full stop.

The state of the art in this general area are malware distribution techniques, and javascript obfuscation. Google is the best source for the latest news, this space changes quickly.

Compared to state of the art, techniques mentioned in the question would not divert a skilled analyst interested in systematically retrieving the data being made available for more than at most a couple of days. Stepping through client side code with a debugger reveals intended secrets very quickly.

Jonah Benton
  • 3,359
  • 12
  • 20
1

As I understand it right, you have website which serves some data, and you simply want to prevent users harvesting it.

Last time I have approached this problem in following way:

  1. Collect log of operations in database which is good for storing logs. Something horizontally scalable. Collect as much information you can, so no typical HTTP log but the detailed log, ideally in structured way.
  2. Analyze the logs and try to isolate users who misuse the service
  3. Block users who misuse the service

This is quite difficult but very effective method, and there are also other ways which can be used with it, see below.

Basically you take these logs and try to "compress it" the way that you produce some sort of report with summaries. Such report contains requests per hour during e.g. Wednesday, variety of requested data, duration of being active e.g. from 9 a.m. to 5 p.m. and so on. Sawmill produces similar reports so it's good to have a look on it. The more data you have the less false-positives and better detection you will have. The main thing about it is that it runs periodically, so for example you might be processing logs the way that you just output results daily and store it in the database as reports which are generated per user per day, and then you can produce weekly, monthly and yearly reports and this way you can precisely say whoever is misusing it.

The better idea than rate-limiting I think it's just to limit number of requests per minute / hour / day. This can be done using other database like Redis for example. If the limit is exceeded, then simply return relevant error. Think about like "real-time metrics". If you go above certain threshold of total requests per period you are simply blocked.

Another idea is to use invites. Like Google Mail did in the past. This way it's easier to track malicious users.

Another thing would be to focus on the account security. To make sure that users are real (like phone number confirmation like on Facebook or CC payment) and that the accounts are not shared, e.g. used by two people at the same time.

Regarding obfuscating data, it is good approach as it effectively limits number of malicious actors. One of the ways doing it is changing the algorithms frequently or even have them coded in advance and change them automatically every month. The way that it can't be automated. However note that there are tools like PhantomJS which can act as normal browser and do that stuff anyway. So maybe some Captcha may be needed as well when you hit certain limit.

You can also change format of the database from time to time, like json responses.

And regarding users, you may require them to logon via Google Account for example so that it's easier than checking them mobile by yourself, similarly with Facebook.

And don't forget to include EULA. And in T&C write clearly what is permitted and what's not.

Aria
  • 2,706
  • 11
  • 19
  • Thank you for the detailed answer! I am doing almost all of what you have suggested (except changing JSON or format of database - seemed error prone). Mainly, I was wondering (3 in my question) if instead I should go ahead in encrypting responses, if there were things I should keep in mind? Or in other words, how best to do that. – geoboy Sep 21 '16 at 01:57
1

2021 answer (from an information security professional) It is straight forward with few compromises to implement encryption for API responses. ✔ Not using any DRM ✔ Using pure js ✔ Uses standard crypto (will exist in any programming language)

  1. When a user registers generate a private/public key pair and store it with the user record (wherever the password hash is stored)
  2. After a user successfully logged in, provide the browser with the server-generated public key, store that to window.localStorage (same-origin protections apply).
  3. In the browser generate a CryptoKeyPair and store the browser generated private key in window.localStorage
  4. Inform the server of the client browser generated public key, do this in an encrypted envelope using the server-generated public key (the server can decrypt this with its private key)
  5. Now you can encrypt API responses using the client public key so only that user can decrypt them
  • Never share client private keys with the server
  • Securely store the server private keys, and never share them with the client
  • regenerate private keys on every successful login
  • When a user logs out remove the window.localStorage items
  • When another user logs in over-write window.localStorage keys with the new users' keys (how Google accounts and others work)
Soufiane Tahiri
  • 2,667
  • 12
  • 27
Stof
  • 151
  • 9