Methods to Prove Data Authenticity from Potentially Compromised Sources?

Question

I've been thinking about this problem for some time and I wanted to ask if there are any known methods, or research papers, about how to prove "authenticity" or correctness of data originating from a potentially compromised source (remote server, process, etc). Specifically what I've been imagining is say you have service A and service B, service B sources data from A but is worried that A has been compromised such that even if data is signed by A, B can't trust that it was generated by code written by A's developers. Is it possible for B to prove to itself that data from A is authentic, that it was indeed generated by the expected code and not injected or generated by an attacker who has compromised A?

One solution I've been thinking about is using a sort of distributed ledger or blockchain so that multiple nodes compute the same data, and in doing so raises the bar such that an attacker would have to compromise N% of the services producing the needed data, this provides naturally replication and I can use an appropriate consensus protocol, but ofc introduces some overhead, efficiency concerns, and I would need to think hard about side-effects being performed more than once.

If there is only one node possible of generating data, such as a sensor node, and it is compromised, I'd imagine all hope is lost, but I also wouldn't be surprised if there is some clever crypto scheme that attempts to solve this problem as well.

I hope it's clear as to what the question is, thank you.

Edit: After some research I stumbled upon two crytoschemes that seem to attempt to address the problem:

Secure Multiparty Computation (SMC). I found a thesis paper Implementation of a Secure Multiparty Computation Protocol and the author says

In typical use cases of SMC, parties involved are mutually distrustful, butone can also imagine the case of multiple machines owned by a single party, performingSMC to collectively decrypt and process confidential data. No single machine would havethe key, and no single machine would see the plaintext. Now it would not be enough forthe APT to compromise a single machine holding the decryption key, but every single oneof the machines would have to be compromised.

This seems almost what I was looking for.

Homomorphic Encryption: This seems to be another cryptoscheme that might be able to achieve a similar goal, except that, if I understand correctly, an attacker could still perform arbitrary operations on encrypted data while not knowing exactly what the data is.

I don't know enough about cryptography to know if these two schemes might one day be a practical option to solve the problem of not trusting service A as described earlier, any insight?

Thanks again.

I don't think that there is generic solutions to your problem since the problem itself is really broad (do arbitrary data generation on untrusted system and somehow trust result). There might be solutions for narrowly defined use cases though, i.e. narrow on how the data got generated, what kind of trust can still be assumed, what operations are done etc. — Steffen Ullrich, Sep 10 '20 at 05:39
And if the attacker compromises one server without any previous knowledge, he will have a good idea on how to compromise all the others as he now knows a lot about the compromised server. Unless every server uses a different implementation, your gains are smaller than the overhead. — ThoriumBR, Sep 10 '20 at 20:48
Thanks for the input. @ThoriumBR I have considered this in the case of using multiple services in a distributed ledger type of setup, if one of the services is compromised there is a good chance they can compromise the others using the same method. — Todd Fulton, Sep 11 '20 at 00:28
@SteffenUllrich I think you're right. I've been reading about security enclaves and they seem of interest, but I'm not sure if a software solution exists. yet. and it doesn't really address the issue directly, but would help with limiting the trust boundaries. Thinking some more, perhaps I'm being too paranoid, even in an extreme case of enclaves, I would have to trust the service within the enclave. Perhaps intrusion detection would be a better place to focus on in regards to this concern. — Todd Fulton, Sep 11 '20 at 00:38

score -1 · Answer 1 · answered Sep 10 '20 at 19:12

-1

Solved problem - use signatures but place no value on self-signed data.

answered Sep 10 '20 at 19:12

symcbean

18,278
39
73

This won't work. Anyone compromising server `A` will be able to sign data correctly. – ThoriumBR Sep 10 '20 at 20:39
OP wishes to establish integrity of files from serverA. Files which have been signed by serverA and hosted by serverA are self-signed and therefore considered invalid by my predicate. – symcbean Sep 10 '20 at 20:54
OP wants a server `B` to source data from server `A` and make sure the data is not compromised even if someone compromised server `A`. Who would sign that data? Server `C`? And how the "signing server" would receive the data and make sure the data originated on server `A` and was not compromised either at server `A` or in transit? And how would server `B` trust nobody compromised server `C` to sign compromised data and pass them as authentic? You would need server `D` to vouch for server `C`... And who would vouch for server `D`? Server `E` perhaps? – ThoriumBR Sep 10 '20 at 20:59
Perhaps the operator of server `A` can sign the file offline, using a private signing key that is stored offline, then upload the file (along with the signature) to server `A`. Then, anyone downloading the file (either from server `A`, server `B`, etc.) could verify the integrity of the file by verifying the signature using server `A`'s public key. – mti2935 Sep 10 '20 at 22:37

Methods to Prove Data Authenticity from Potentially Compromised Sources?

1 Answers1