There's no obvious mapping available. It's largely going to come down to experimentation, and guess-and-check. (Or, if you can get the source code, that will obviously work.)
To be clear, you almost certainly don't need the solution. "Your high-entropy password reset token isn't verified; all that's left is a short and deterministic token that doesn't expire" is a serious security flaw, one I'd consider well worth a bounty.
Still, it makes sense that you'd want to totally break this thing. So, let's look at some approaches. Note: this is going to require registering a LOT of usernames, and requesting password resets for them.
- Note that there's a very suspicious amount of repetition in the tokens, given similar inputs. Enough so that we can already rule out most actual cryptography; even a one-bit change in the plaintext will produce a huge change in the ciphertext of a block cipher (though not a stream cipher; that will have simply a corresponding one-bit change) and should completely change (specifically, should introduce a ~50% chance of flipping every single bit) in a secure hash function. That's good news; if it was real crypto, you almost certainly couldn't break it.
- Examine the inputs and outputs in different formats, but also compare them in the same format. For example, you seem to be comparing hexadecimal values to strings. Those strings might be ASCII/UTF-8 or similar, but they might be something far more restricted. For example, they are valid base64. Any time one value seems to be an encoding of something that the other is not, try decoding it until they're in the same format... and try going the other direction too, applying encoding. Remember that converting between ASCII and binary, hex, or decimal representations is an encoding/decoding too! As you proceed through the rest of this list, don't forget to apply the same transformations for comparing at each step.
- Given the similarities, look for simple changes in the username producing predictable changes in the token. If you flip a single bit, how much does the token change (note that in your example, 5 to 6 is a change of two bits, 101 to 110). If you flip a different bit, does it change somewhere else? If you flip every bit except one, does part of the token stay the same? If you invert every single bit, does it produce a predictable output? How about if you change every byte except one? If you repeat a substring in the input, does it produce a common substring in the output? If the input is all the same byte/character, does the output also become a repeating byte, substring, or at least byte sequence with a predictable progression? If you swap the first and last halves of the string, or reverse the byte order, does this have predictable impacts on the output?
- Try also varying length. If the username is much shorter or longer, does the token length also change? If so, that suggests it's in some way encoding the username directly; if not, that suggests it might be a fixed-length hash or checksum of some kind. It might also be blocks that are padded somehow; see if there's a threshold at which the output length changes. Try also things like prepending or appending zeros (or, if you can, null bytes, that is
0x00
) to see if those change the output; that might tell you things about the construction used. You can also try varying the prepended/appended values to see if at any point if matches the output of a prior base (shorter, sans appended/prepended values) input; if so, you've probably found a padding value and can maybe infer things about the padding scheme from it.
- Finally, some things to guess about. Is it one of the common checksums, e.g. CRC32? Is it some library's built-in hash function (such as a hashtable might use), such as the .NET
String.HashCode()
or Java String.hashCode()
function? (Bonus points here for identifying the server language and ideally framework, and looking for short hash functions in the standard library, framework, or common third-party libraries.) Is it a truncated hash function, either something weak or something like SHA256? (Remember to compare multiple representations, as the output might be encoded after - or even before - truncation.) Look out also for simple, dumb transformations - bit shifts, a fixed XOR mask (which might repeat after some length), byte swaps, etc. - although if it's a big enough combination of those, you might not figure it out.
Because it's an interesting challenge, let's look at some things in the example values you provided.
Base64-decoding NzY4MzY5
produces ASCII 768369
and NzcxMzc0
produces 771374
. That's extremely unlikely to be a coincidence - a random byte has only a 5/128 or roughly 4% chance of being an ASCII decimal, to have it happen twelve or even six times in a row by chance has exceedingly low likelihood - so we have probably already learned something about the format: it's base64-encoded representations of ASCII representations of numbers!
Let's see if we can go further. 768369
in decimal is 0xBB971
in hex; 771374
is 0xBC52E
. Difference of those values is decimal 3005
and hex 0xBBD
; XOR of the two values is decimal 31839
and hex 7C5F
. That doesn't tell me anything obvious, though it's interesting that the difference was so close to a round number in decimal (could totally be a coincidence though). In any case, it might be an interesting thing to compare other pairs this way.