0

I want to create a file server that will be serving encrypted files without permissions. It would be really awesome if it were impossible for a hacker to tell if a file exists or not, so I am thinking that if every endpoint that does not contain real data returns a file that appears to be valid encrypted data it would be impossible to tell which data is real and which is fake.

What is the problem with this approach?

4 Answers4

2

Look Into the world of honeypots. You can generate fake machines, printers, network traffic ect. The goal is to direct the intruder to the fake network which is usually on a DMZ, different subnet or separate firewall while the IDS kicks into action.

You also download a script based random text generator and then apply whichever encryption method like MD5 to the string and then pipe that to a txt file followed by compressing it. I like the “bullsh*t” text generator the best.

Theologin
  • 141
  • 3
2

TL;DR

Don't do it. Just use authentication and 404s.

How

Fake encrypted data is just random data, from a CSPRNG or TRNG. You don't need to actually run any encryption. To keep attackers from requesting twice and dismissing any file that changes as fake, seed the algorithm with something based on the URL. For example, the SHA-256 hash of the URL. You don't need a cryptographically secure password hash; you're not worried about people finding collisions. In pseudocode:

def handle_request(filename, user):
 if file_exists(filename):
    return Response(200, get_file(filename))
  else:
    rand = Random(sha256(filename))
    file = range(10000).map(rand.next).to_string()
    return Response(200, file)

Why not

But there is a problem. You're burning CPU generating random data for every single response. I could take down your site with this script running on enough machines:

num=0
while true; do
 echo curl "https://your-site.com/invalid_url${num}"
 ((num++))
done | parallel

Oh, look. Now your server is tied up spewing random data out to me, as fast as my CPU and network can handle it. And that's quite possibly the most basic incarnation of a DoS script. More sophisticated attempts will use multiple machines, dynamically increase the number of concurrent requests until it hits a bottleneck, etc.

Of course, it's also possible to trigger something similar if someone bookmarks a page and their browser pings it occasionally to update the favicon. Or for someone to be totally sure they got the right document, so they keep hitting refresh, trying to figure out why it doesn't work.

In addition, if there's a database leak -- something you seem to be worried about, judging by your comments -- then your 'solution' instantly dies. Any attacker with access to the database will be able to see all the existing filepaths and get access to them just by going to the URL.

Instead

The usual solution is to have some sort of authentication scheme, then send a 404 no matter why it can't be accessed (doesn't exist, no permission, etc.), not to generate fake files. You could still be DoSed by a dedicated enough attacker, of course, but spending less to answer every request means that they have to send many more requests, and are easier to detect and stop as a result. Plus, it's easier to explain "If we can't find a file tied to your account, we assume it doesn't exist and tell you that" than "We generated fake information and showed that to you instead of the information you requested" to an irate customer.

In pseudocode:

def handle_request(filename, user):
  if user.logged_in() and file_exists(filename) and user.has_access(filename):
    return Response(200, get_file(filename))
  else:
    return Response(404)

You also don't need to use user-based authentication, if you're worried about, say, attackers learning that user nic-hartley and user inferior-nicks are actually working together, because you can see that they can both access files 1542, 1092, and 5840. You could require a password, and check file_exists(filename) and password_correct(pass, filename). You could even do something more complex; there are a lot of authentication options. However, note that metadata like that could only leak if there's a database leak, and this solution is still better, because you've only leaked the associations and not access to all of the encrypted files.

Nic
  • 1,806
  • 14
  • 22
  • Quite astute. Starting with your last point, you are right that authentication is the best solution for most cases. I am concerned about storing the meta data that links the account and its associated data. Is there a way to use authentication and not store that data?@Theologin about your permissions comment – William Rusnack Jul 06 '19 at 15:48
  • @WilliamRusnack I used user-based authentication because that's the most common and therefore typically the easiest to implement, but it doesn't have to be. You could also require a password (or hardware token) to authenticate any filepath, and the check becomes `file_exists(filename) and right_password(pass, filename)` instead. I'll edit my answer to clarify that. – Nic Jul 06 '19 at 15:54
  • Your unique token for each file is great. – William Rusnack Jul 06 '19 at 16:01
1

If the data you want protected is important, maybe you really should look into some type of permissions at least with htaccess for example limiting indexing, blacklisted IPs or file extensions. But if you are just experimenting or planing on running people in circles here are some things to consider...

A hacker could look at certain thing to determine the real files like: Meta data of the files, difference in types of encryption, times uploaded, back links, static/dynamic files, duration live, server response time, ect.

You could even create a maze of directories that include alias, difficult / random names or even dynamic names for the fake files and use something like shortUrl for the files you plan on sharing.

Theologin
  • 141
  • 3
  • Look at my comment on Nic Hartley's answer in regard to the permissions. – William Rusnack Jul 06 '19 at 15:53
  • I think the meta data can be stripped from the files. You server response time is a bit more tricky, but seems like that could be modeled with some effort. – William Rusnack Jul 06 '19 at 16:08
  • Smart approach with seeding the RNG with something based on the URL! I would add a pepper to the URL before hashing. That way, the attacker can not figure out what you are doing. – Anders Jul 06 '19 at 16:42
1

Many years ago I briefly played with tarpit software called labrea. It returned an endless set of generated pages and links, with the idea it would keep spiders and automated scrapers from harvesting your real site once they became stuck in it.

i don’t consider it a very realistic threat anymore, so I don’t worry about it. But if it’s what you are concerned about, go for it.

John Deters
  • 33,650
  • 3
  • 57
  • 110