Is a file hash checking system 100% secure and non-bypassable/fakable?

Question

I'm building an open source distributed (and partly offline replicated, as it's assumed that the network will be unstable) CMS and one of the core system 'job' will be to group all the files and generate a unique hash from them.

This special hash will be checked against a genuine hash to see if they match, so to ensure that nobody alters the source maliciously.

If both hash matches we can safely go on with the process and particularly the replication, if no match is found, we get the ip/proxy/tor-node and block subsequent join attempts.

Ok, lets add some context...

Let's assume we have a folder called 'cmsfiles' and inside we have the files: app.py, dbconn.py, users.py, reputation.py, node_list.py, blacklist.py etc..

There is John, the founder of a portal called "Breaking old News". He has a genuine hash ("originalhash") and is in the "node-list", thus he's a point of reference to be checked against.

Now, comes Dan who interested in the portal wants to join it for having his deal of participation (be it discussions or anything else).. so he downloads and install (let it be python + sharded Sqlite DB) the software suggested by the read-only portal.

Now, when he visits the portal again, a message is displayed telling something like "We're checking for integrity before aligning both machines, keep tight man and wait"..what happens in the background is that John's machines is contacted (from the node-list addresses list) that then proceed to group all of Dan's machine files in 'cmsfiles' to a single hash and check against the genuine one..if both hashes match, Dan's machine is added to the node_list list and his database is downloaded/updated and so follows John's "train".

2 days pass from Dan's joining..

Another guy called "Jack" comes along, but with malicious intent, as he altered the "reputation.py" file so to overcome the limitation of 120 points to vote.

He sees the same thing as Dan but in the background something little different happens..

When John's or Dan's machine check the hash, they find that it doesn't match with the requester, so either John's or Dan's machine proceed to kick him out of the network by putting him in the "blacklist.py" list.

note: if Jack manually changed the addresses in the list so to try validating his machine he would be on his own and would have just created his own portal without any harm to the other portals already existing

What I'm not sure of, is if this can be considered a safe system and that nobody can fake/bypass the hash some way. Or maybe if I'm doing it wrong and should be done in a different way?

p.s. Was partly inspired from the inner working of Bitcoin and Osiris sps.

score 4 · Accepted Answer · answered Mar 13 '14 at 19:50

4

If I understand you correctly, you're asking if the process you've described places the security burden on the hash algorithm, hence the question about how good hash algorithms are. Well scrutinized algorithms (like SHA1 through 3) should be very good, in that creating a collision should be extremely difficult.

However, from reading your process, I think the weak point lies elsewhere.

What is stopping the malicious user from lying about their hash result?
To prevent #1, you may require the users files to be uploaded and hashed on the server (creating its own set of problems). If so, how do you prevent the malicious user from sending the files you expect to see, while running modified files?

Tools such as MITMproxy make this trickery a lot more accessible that it used to be. You will probably find that Stackoverflow is also very good at noticing general process/workflow issues like this one.

answered Mar 13 '14 at 19:50

scuzzy-delta

9,303
3
33
54

1

Note that SHA-3 (candidate for SHA-3 is KeccaK) hasn't been vetted yet and not accepted as new standard by NIST – Lucas Kauffman Mar 13 '14 at 21:20
Yeah, you're right, that's something I easily overlooked.. I thought about this: To resolve issue #2 we have to change strategy. We could send the requesting machine's files to the genuine one, compress it there and compare the hashes.. this with every request being done between the two parts. Obviously here we'll unfortunately have a network overhead, that would be mitigated by making the codebase tighter and resort to a fast compression algorithm, something lying between the strong Lzma algorithm and the fast Tar one. – gw0 Mar 14 '14 at 11:05
On further consideration, it seems your requirement is essentially to implement an [anti-cheat system](http://en.wikipedia.org/wiki/Valve_Anti-Cheat)...so I wish you the best of luck! :) – scuzzy-delta Mar 14 '14 at 17:39
Thank you, that link is a huge help! Will research other anti-cheat systems as well. I think I can ban a specific user by adding his hardware uuid instead of bringing down his entire network. Going to give your help back as open source content! :) – gw0 Mar 15 '14 at 09:13

score 1 · Answer 2 · answered Mar 13 '14 at 20:23

You're overlooking a huge component here. The connection between points. You can slap on say SHA512 and it would make no difference if the wire is visible between points.

YourServer --> world
World --> Yourserver

Which means

Dan --> connects to --> YourServer

How is this connection done on the network side of the equation? SSL, VPN, etc?

You're also not taking into account host based intrusions. For example, imagine Dan is a university student and we are in the same dorm. Depending on Dan's networking architecture (of which you will not have control of), what's to stop me from sniffing the network, taking his token and coming back to you as him? (Impersonation). I can assure you, from a NAT perspective, you will see but one IP address.

Now what of the host based intrusion where say, I managed to get ONTO his machine, and am passing data right through his machine. (Similar to what malware writers do to steal data/sniff/keystroke log). There are a lot of things to consider

Good point! should have mentioned that in my question. Of course we'll have an authentication mechanism where each user has a private key file (think about the bitcoin wallet) containing his password and other informations, that is signed with every network request/post. The entire thing will be settled on ssl as far as I'm concerned. Regarding the host based intrusion, the same thing could happen with 100% centralized servers anyway. But even if the user has his private key stolen, he should be able to recover that account if he knows the password. — gw0, Mar 14 '14 at 11:05
It's really upon them if they are security-aware, don't share their private key (maybe put that in a TrueCrypt directory) and have some sort of antivirus installed, just like any other security concerned environment. — gw0, Mar 14 '14 at 11:06

Philipp · Answer 3 · 2014-03-14T15:11:47.300

0

The achilles heel in this case is that the hashsum calculation requires the cooperation of the attackers machine. The attacker could have two versions of the application - one genuine one and one modified one. There is no way to prevent it from presenting the genuine copy for integrity-checking but then using the modified version for actually connecting to the network and interacting with it.

When you develop a distributed system, you can not trust the 3rd party systems to do anything right. You need to develop your netcode based to the assumption that every participant has hacked their node beyond recognition and that there is nothing you can do against it. The security of the system has to be be built into the protocol. Any limitations, like requiring 120 reputation to vote, must be enforced by other nodes, not by the node itself.

edited Mar 14 '14 at 15:11

answered Mar 14 '14 at 15:06

Philipp

48,867
8
127
157

That's what came in a second thought. Basing myself on your answer, would limiting the operations on the client side to just the needed (we're taking about inserts, updates, shadowing) do it? Obviously other users references such as reputation will be replicated locally after being digitally signed, to be sure we don't replicate fake votes or/and reputation. note: by shadowing I mean that the records still exists, but it's hidden from general view, but otherwise accessible as a revision. – gw0 Mar 14 '14 at 15:30
So to say: 'if in my or my mate's database your record has over 120 rep and your key is successfully signed, you're free to update the post's vote count, otherwise I register your malicious attempt and if it's greater than a said number you're blacklisted.' – gw0 Mar 14 '14 at 15:38

Is a file hash checking system 100% secure and non-bypassable/fakable?

3 Answers3