2

I have a bunch of network data and I would like to determine if it is either 1) encrypted or 2) compressed. I doubt it is both, but the potential exists. If I am assuming that the traditional compression headers are stripped (preventing me from simply running file on the data), then how would you determine whether the data I am looking at is either compressed or encrypted, if it had to be one?

The reason I would think that the compression headers are stripped is because it would be unnecessary information considering both the client and the server know the exact method to compress and decompress the data.

Examples of the application-layer data is posted on http://pastebin.com/0VdR8XQ4 and http://pastebin.com/cF81RTkj

zz3star90
  • 21
  • 1
  • It seems rather strange to me that you'd assume the compression headers are stripped. Why would any application developer bother to save the few bytes and at the same time destroy any compatibility? – Steve Sether Feb 10 '15 at 00:09
  • Maybe just try decompressing with a few common file types? – KnightOfNi Feb 10 '15 at 00:33
  • 2
    I'm voting to close this question as off-topic because it is a better fit for crypto.se – Mark Feb 10 '15 at 07:59
  • Generally data is first compressed, then encrypted. If there is no standard secure protocol used the encryption mechanism can only be determined by knowing what client sent the data. – RoraΖ Feb 10 '15 at 12:09

2 Answers2

2

In theory, this can be difficult because both encryption and compression seek to produce high entropy data. Practical results depend on the specific compression algorithm used; for example, a writeup on /dev/ttys0 claims success in using certain statistical techniques (Chi-Square or Monte Carlo pi approximation) to differentiate certain compression/encryption techniques.

Ari Trachtenberg
  • 822
  • 6
  • 14
1

If you can get a sufficiently large batch of the "subject" data, you could analyze it statistically. Good encryption, over a large enough data set, should show high entropy and distribution. Each possible value for the bytes (i.e., in the range 00h to FFh) should, in particular, show up about 0.39216% of the time. HxDen will let you paste your hex values in and save as a binary file and give you a general statistical count. The DieHard utilities will let you evaluate for distribution, including distribution of repeats, and other qualities of randomness. (None of your sample sizes linked in your question are large enough to make a good analysis).

If your data stream incorporates blocks of data (encrypted or otherwise) that are sprinkled with HMAC or other verification/authentication blocks, this will not hold even if the data is encrypted.

Compressed data in most cases will show slightly more varied byte distribution, often with the appearance of a sawtooth shape if my memory serves.

Its pretty hard to provide better help absent some additional details.

boggart
  • 516
  • 3
  • 5