How is distributed computing security ensured?

Question

I browsed some articles found on the web but none of them seems to have the answer to my question (or maybe I didn't see it).

I've heard that it is possible to give one's CPU unused power to research laboratories to help developing science while sleeping.

But what if someone decides to return a wrong answer, a spyware or I don't know what?

How do they ensure that the result they get is the result of the calculus they asked?

This question seems really broad. Distributed computing is achieved in a myriad or ways and there really is no 'one true answer' to the question as phrased. Furthermore data acuity 'How do they ensure that the result they get is the result of the calculus they asked?' is more a function of the implementing system design than of security. Please refine. — grauwulf, Feb 18 '13 at 17:55
I understand that my question might be a bit too broad. I will try to refine it. However I have absolutely no idea of how this is possible and I am pretty stunned that distributed computing with "untrusted machines" works. So I'm looking for an answer that could ease my amazement. Maybe a concrete example? — Antoine Pinsard, Feb 18 '13 at 18:12
I understand where you could be confused with this, it's a big problem domain. Perhaps it would help to ask something to the effect of 'is there an industry standard security model for distributed computing systems?' If that were your question I might have pointed you toward NIST SP 800-144, or something like that. — grauwulf, Feb 18 '13 at 18:18

score 5 · Accepted Answer · answered Feb 18 '13 at 18:28

Getting fake or skewed results from user-contributed CPU power is a real issue which has no known practical solution, but there are several mitigation strategy:

Duplication. For each given chunk of work, give it to two distinct contributors, and compare the results. This will trap most fraudsters and also computing errors due to bad RAM; unfortunately, this also halves the available computing power. This strategy was what used by some friends of mine who (briefly) got the world record for the N queens problem (i.e. counting all solutions; they computed it for N = 23 back in previous century).
Sampling. This is a variant of the previous solution, in which you duplicate only some chunks, e.g. one in ten (randomly chosen). This will trap fraudsters who game results at least occasionally, while incurring much less overhead. You have to keep track of who did what chunk, so that suspicious values can be removed en masse.
Incentives. If the work is of the find-needle-in-haystack persuasion (like a distributed key cracking effort), then you can announce a grand prize for who finds it. This will induce a lot of potential cheater to actually do the computations, since not doing them means that they will not get the prize.
Fully Homomorphic Encryption. That one is a generic solution, but the best known algorithm for FHE has the inconvenient drawback of incurring some overhead which is counted in billions -- i.e. even if the whole Internet contributed to the computation with FHE, it would not match what you could achieve without FHE on a single PC.
Obscurity. The all-time favourite of anti-cheating strategies: simply don't show the source, keep the protocols secret, and hope for nobody to bother running a debugger and doing a bit of reverse-engineering. In a similar scenario of distributed computing, namely online games, it does not work well.
Luck and trust. Simply pray that there will not be too many fraudsters, and that most chunks will be processed as they should. Depending on the type of computation, this may be a valid strategy. For instance, it may work for a key cracking effort, since a few skipped chunks will not matter (unless the key is in one of them, which would be bad luck but improbable if most contributors are honest). It would not work for a complete counting where you add all the results (each wrong value will alter the total).

Historically, distributed.net encountered such issues in their DES/RC5 cracking efforts. Some people would fake results, i.e. alter their clients so that they responded "job done, no key found" without doing the computations; the gamed clients would respond faster than clients which really do the work. They were doing that just to be at a better rank in the list of contributors, ordered by CPU power (retrospectively, it was quite obvious that computer owner would act just like car owners do). Their response was an obscure client without source, which was sufficient in that case (too few people did the reverse engineering to cheat again).

There is also the dual problem, i.e.: how do you make sure, as a contributor, that the client code they send you is really benign ? This is not an easy question, and it is made harder by closed-source client code...

Linux includes a nifty system call named seccomp() which could be useful for that. And, at least theoretically, this might not be sufficient for complete isolation of a potentially rogue process, because that process will share caches (L1 cache, jump prediction...) with other processes, and it has been demonstrated to be usable (in lab conditions) to recover an encryption key.

Incentives is a rather big one; that's how Bitcoin works, by awarding new Bitcoins to the owners of the nodes whose transactions result in a hash value less than a given integer value. In that case, there's independent confirmation as well; other nodes can verify that the resulting transaction log does indeed hash to a particular value (with *much* less effort than the node which found it had to go through), and because the transaction system also works on peer confirmation, the other nodes can usually also verify that most or all of the transactions really happened. — KeithS, Feb 18 '13 at 19:16

Manishearth · Answer 2 · 2013-02-18T18:38:22.700

Spyware is easy, the ensure that the data coming back follows a certain format, and they don't execute any part of it. It's no different to uploading JPG files to GMail or Tumblr -- whatever you upload won't be executed in any way, so it's OK.

Regarding "faking the data":

For one, they generally send out the same data to more than one user. Secondly, hints of an "interesting event" will obviously be checked by a human. Of course, if you "hide" your findings, there's not much that can be done except verifying your submission with someone else.

Their website is currently down, but I found this in a copy of the SETI@HOME FAQ:

What if someone fakes a result to make it seem like they found a signal?

The SETI@home staff will be reviewing the actual data that produced the result, and if they don't find the same results, they will discard the fake. Besides, while it's not impossible, it might be harder than you think to fake a result file.

Since some workunits are sent out more than once, SETI@home can detect errors by comparing the results. During the time of the project, the sky will be scanned several times. It's very unlikely that a cheater would get a workunit from the same location in the sky more than once.

A good read on this would be this pdf ^(doi), which explains a nice way systems can protect themselves from users spoofing data.

In this paper, we first propose a scheme called Quiz to combat collusion. The basic idea of Quiz is to insert indistinguishable quiz tasks with verifiable results known to the client within a package containing several normal tasks. The client can then accept or reject the normal task results based on the correctness of quiz results.

How is distributed computing security ensured?

2 Answers2