I'm looking at the code of a particular web application that handle file uploads. For some reason, instead of using the cryptographic hash function (SHA-256 in this case), they derive an ID from it, and use that everywhere instead, to identify files uniquely.
The steps involved are as follows:
- Calculate the SHA-256 sum of the required file.
- Take a maximum of 3 characters per iteration, and treating it as a hex string, convert it to its equivalent base62 notation (i.e.
0-9a-zA-Z => 0 - 62). - Append these strings in that order, and obtain the "ID".
For example:
hash (file) = 26ba0a896923d2de4cad532a3f05da725d9cc08d371eaf96905f5bbc1901b56f
26b -------> 9Z
a0a -------> Fs
896 -------> zs
923 -------> BJ
d2d -------> Sp
e4c -------> X2
ad5 -------> IJ
32a -------> d4
3f0 -------> gg
5da -------> oa
725 -------> tv
d9c -------> Uc
c08 -------> NG
d37 -------> Sz
1ea -------> 7U
f96 -------> 12m
905 -------> Bf
f5b -------> 11p
bc1 -------> Mx
901 -------> Bb
b56 -------> KO
f -------> f
ID = 9ZFszsBJSpX2IJd4ggoatvUcNGSz7U12mBf11pMxBbKOf
To me, this does not seem to be a safe way to truncate the hash at all. In particular, it looks to me that the probability of collisions increases this way.*
Do the above operations pose a problem, or do they not interfere with the cryptographic strengths of SHA256?
* The resistances of the SHA-2 functions may prevent an attacker from exploiting this. However, I'm just concerned about the premise of the function itself.