question about encoding

Question

eNSiiFh+GQpFT2B/KrHFamSj4eCag2xAftOXIJ096Sk= is some data provided.
By looking at this data I have to tell if this is an encoding or encryption process.
How do we know the difference by looking at the data?

@ConorMancone That is not *exactly* the same question. That one asks which encoding or encryption it was. This question asks, conceptually, how to tell the difference. Regardless of whether that page answers navjot kaur's question, this question is not exactly a duplicate. (Though I'm also fine if you want to close it now with my answer as the only answer that people can upvote when looking for the conceptual difference :D) — Luc, Feb 04 '20 at 20:03

Luc · Answer 1 · 2020-02-04T20:07:46.080

Encrypted should be very high entropy. Any reasonably good encryption will be indistinguishable from random. There are no patterns easily visible.

Encoding can be like this: 74 68 69 73 20 69 73 20 61 20 73 61 6d 70 6c 65 20 74 65 78 74. See how all values are within the range of 61-78 (plus an exceptional 20 every now and then)? That was "this is a sample text" encoded. Now if we encrypt it: ad d0 e8 c9 61 5e 6f 57 c4 b3 31 ed 97 83 df 92 f5 59 ec 66 b9. No more pattern (in the same encoding, both was ASCII values as hexadecimals).

Since your example only uses a-z, A-Z, 0-9, and +/=, there are unused characters (like !@#) and either this is a pattern (meaning it is not encryption), or this is encoded (because the original was binary, unprintable data, which may have been the result from encryption).

Encryption is not the only data that shows very high entropy. Compressed data (including data from images and audio files) has the same property. Telling the difference between encrypted and compressed data can be very hard. Frequently, compressed data has a header that indicates the type of compression that was used. Data often has "magic" values as well, like the string %PDF occurs at the start of PDF files. But neither is guaranteed to be there.

On a very short sample as you provided, it is hard to tell. If it were an exam question, I would (after describing why it is a bad question) answer that it is encoding, because I recognize base64 encoding. If it turns out to be encryption behind the base64 encoding, well, then it was still base64 encoding.

The first time it seemed you were saying that it is either encrypted data, or it is encoded data. Obviously a lot of encrypted data is then encoded so it can be both. You mention that in your answer but I missed it the first time around. — Conor Mancone, Feb 04 '20 at 20:27

question about encoding

1 Answers1