14

I want to make a prediction about a future event, but only reveal that prediction after the event occurs lest knowledge of my prediction affect the outcome.

For example, suppose Alice predicts that Bob will drink the spill his glass of milk on April 24th at 9:17 AM. She writes to a text file:

Bob spills his glass of milk on April 24th at 9:17 AM.

Alice does not want Bob to know the specifics of the prediction before the event, otherwise Bob may avoid milk that day in order to prevent the outcome of Alice's prediction. However, if Alice only reveals the prediction after the event has occurred, Bob can claim that Alice fabricated the prediction after the event had already occurred.

Naturally, Alice should provide some verification of the authenticity of her prediction to Bob before the event -- a verification that does not reveal the specifics of the prediction until after the event, but which confirms that the prediction was made before the event.

I've come up with two possible verification methods:

  1. Encryption
    • Alice applies a symmetric-key encryption algorithm (such as AES) on the text file
    • Alice provides Bob with the encrypted file
    • After the event occurs, Alice provides Bob with the secret password
    • Bob decrypts the encrypted file and reads the prediction
  2. Hashing
    • Alice applies a hashing algorithm (such as SHA-256) on the text file
    • Alice provides Bob with the computed hash
    • After the event occurs, Alice provides Bob with the original text file
    • Bob applies the same hashing algorithm on the text file and verifies the hash

Which verification method is the most secure (i.e. which would theoretically be harder for Bob to overcome and thus discover the predicition before the event occurs)?

Which verification method is most practical (e.g. hashes take up significantly smaller space than an encrypted file)?

Vilhelm Gray
  • 390
  • 2
  • 9
  • 2
    You're looking for a "Commitment scheme" http://en.wikipedia.org/wiki/Commitment_scheme – paj28 Jan 30 '14 at 16:48

1 Answers1

14

Encryption does not offer the functionality you look for; not directly at least. As an extreme case, consider One-Time Pad: the key is as long as the data to encrypt, and encryption is done with a bit-by-bit XOR: given key K and plaintext P, ciphertext is C = P XOR K. Suppose that you used OTP for your encryption; you predicted P and published the encrypted version C. Later on, the prediction turned out to be wrong; Bob does not spill milk but wine (at 9:17 AM, what a shame !). You now want to appear as if you had predicted it correctly all along; thus, you want the ciphertext C to decrypt as P', not P (note that P' has the same length as P). This is easy: compute key K' = C XOR P' and publish K' as the key you used. You can verify that C is indeed the result of the encryption of P' by K'.

So, with OTP, you can fake a posteriori the initial message. This is this characteristic which makes OTP a theoretically unbreakable encryption system: every possible plaintext is possible (with the same probability) for a given ciphertext.

Therefore, if you use another encryption system (say, some AES), then this may provide the kind of commitment that you are looking for only by being imperfect as an encryption system. This does not look like a good foundation for security: it must be good, but not too good.

Hashing is a much better candidate. Hash the prediction P into H = SHA-256(P), then publish the hash value. Later on, reveal P: everybody can hash the revealed P and see that it matches the previously published hash value H. This is secure as long as the hash function is collision-resistant, i.e. as long as you cannot create two prediction P and P' which hash to the same value (because otherwise you could choose to reveal the one which happened).

As a side note, the publication can take several forms. Consider trusted timestamping: this is a public method for commitments. The Time Stamping Authority (TSA) receives a hash value, encapsulates it in a structure which also contains the current date and time (as known by the TSA), and signs the structure. In effect, the TSA, by its signature, says (in a verifiable way): "that hash value H was published at date T or before, because I saw it at date T". In a non-computer world, such a job is done with a Soleau envelope.

The hash value has a downside: it potentially allows someone to try to brute-force the prediction. Given the hash value H, one could try possible predictions P and hash them all until a match is found, thereby rebuilding the prediction before its reveal. For instance, one could know that you predict the spilling of some beverage at some date, and try all combinations of beverages and dates. Since hashing can proceeds at a speed of billions of messages per second (with a good GPU), some structured predictions would be at risk of early reveal. To prevent this attack, add random padding at the end:

Bob spills his glass of milk on April 24th at 9:17 AM, and
here is random padding:sef9yb94bo3qr7yqwnc67sb32

Be sure to use new random characters each time you make a prediction (don't reuse them !). Note that one can build a contrived example of a hash function which is "secure" in the sense that it does not allow collisions to be computed, but still fails at hiding the prediction value; however, this would be a contrived example. With SHA-256, the random padding will do the job, provided that you add enough of them (a random character will add 5 or 6 bits of entropy; at 80 bits, enemies are utterly defeated, so 16 characters are sufficient). A drawback of the method is that you have to keep the complete input string, i.e. the padded prediction, with the random characters; this can be troublesome if you intend to keep it in your head.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • Suppose an attacker has no information about the prediction -- that is, they simply receive the hash and knowledge that it was computed from a text file. Would the random padding no longer be necessary, since the length of the text is not known (as far as the attacker knows, the hash could have been derived from the concatenated sentences of an entire library)? – Vilhelm Gray Jan 30 '14 at 19:24
  • 2
    The random padding is useful only if the attacker has a non-negligible probability of finding the _exact_ prediction text when he tries "potential predictions". It is not a question of _length_ but of _contents_. In practice, we have to assume that if the attacker is interested in cracking the prediction, then he knows some context information about it, which can help him narrow his search. The random padding guarantees an absolute bound to this "narrowing". – Tom Leek Jan 30 '14 at 19:56
  • Regarding your last point about keeping it in your head; the original string could be encrypted (e.g. a gpg email sent to yourself) (?) – Darren Cook Feb 04 '14 at 01:07
  • @TomLeek Isn’t it also a function of length? Longer the messages have more entropy. If you were to predict the weather in the 100 most populated cities, your prediction couldn’t be brute-forced even if the attacker knew that’s what you were doing. Alternately, if you were to hash a patent application for proof of anteriority, the number of possible wordings is such that padding is unnecessary even for an attacker who knows exactly how the device works. I’m also not sure where you got the 80 bits number. Could you look at my question (http://security.stackexchange.com/q/85388/71892) about it? – Daniel H Apr 07 '15 at 03:55
  • 1
    Length does not provide entropy. It merely provides _room_ for entropy. The "80 bits" value is the traditional limit of computational feasibility, though the relentless increase in computational power of computers should warrant, at some time, a new traditional limit (cryptographers being in love with powers of 2, they now often talk about "128 bits" as that absolute limit). – Tom Leek Apr 21 '15 at 13:58