Consider a system that receives, but does not store, a piece of sensitive information X from a user. If I wanted to add a way to remember when a user has submitted the same X more than once, what would be the most secure way to do so? X is used for no other purpose; it is not a password.
The obvious but insecure approach would be store all submissions of X and check for duplicates. One way to make this more secure would be to compute a verifier (bcrypt or similar) of submissions. Properly generated, my understanding is that this would defeat precached attacks with rainbow tables.
My concern is with dictionary attacks. If X is relatively guessable, a determined attacker can probably make slow but steady progress against the verifier. One solution to this is to deliberately truncate the verifier to increase the number of possible plaintexts that it matches. Because X is not a password, it should be OK to increase the number of false positives (a duplicate detected when there is none). But on the attacker side, they may find that it's easy to find a matching plaintext, but there are so many of them that it's impossible to tell which one is the original.
In other words, if the verifier entropy is significantly less than the plaintext entropy, the set of all plaintexts that match a given verifier should be much larger than if the entropies were of comparable size. Then it's impossible to tell which plaintext is the original.
Is this a valid approach? Or is it overkill if the verifier is computed using a slow algorithm like bcrypt?
 
     
    