11

I have seen some of the algorithms which are used to detect if a file is a stego-file or not, but all those algorithms check for specific patterns and are not universal. Maybe there are universal detection systems, but I have not seen them yet.

Situation:
When a message is hidden in an image, I want to check for all the possible ways that it may be a stego-file - given that I don't have the original file.

Problem:
I have no idea where to start.

Any help will be appreciated. Thanks

ritesh
  • 213
  • 1
  • 2
  • 7
  • 1
    if you know the key and encryption algorithm, you can just decode the message and see if it works out. – John Dvorak Oct 22 '13 at 09:12
  • @JanDvorak yes, but this is the case when I am trying to detect that any message is merged or mixed with the original data or not. When I don't have key, I don't have any info. what only i have is the stego-file which is supposed to be checked that any message is hidden or not – ritesh Oct 22 '13 at 09:20
  • Hmm... the purpose of steganography is to be invisible. You're going to have a hard time distinguishing random-looking data from the pure noise you'd expect from a CCD chip – John Dvorak Oct 22 '13 at 09:25
  • First thing to check: what's the file type? Also, is the file size suspicious? JPEGs shouldn't have much noise after the compression. – John Dvorak Oct 22 '13 at 09:26
  • 2
    *See also:* http://security.stackexchange.com/q/2144/30521 – LateralFractal Oct 22 '13 at 10:22

6 Answers6

10

There can be no universal algorithm to detect steganography.

You can implement a series of tests against every known specific steganographic system in existence. But an attacker can use that as a test to develop a new form of steganography that bypasses all existing tests.

What you'll need to do is start researching all the various known forms of stenography in existence today, and provide tests that identify each one. Least Significant Bit is just one of dozens of known techniques.

Alternately, you can look to tools that others have developed. See outguess.org for a ten-year-old project that tried to do something like this. One novel thing the authors of stegdetect did was to be able to provide a set of plain images, then include samples of those images known to contain steganography. Using linear discriminant analysis, they are able to create a detection function based on the differences.

John Deters
  • 33,650
  • 3
  • 57
  • 110
7

Steganography relies on the latent noise-to-signal ratio of the analogue source material. The "least significant" bits (actual bits will depend on codec) are overwritten by an encrypted stream of secondary "stego" bits such that the primary public content of the image is not destroyed or distorted with notable artefacts.

If the steganographic tool didn't perform histogram analysis on the noise patterns of the least significant bits and then adjust the secret payload accordingly (with a corresponding reduction of max payload size) - then a scanner can simply check for least significant bits that are too random. That is, due to the pseudo-random nature of most encryption protocols of a steganographic toolkit instead of natural CCD noise.

And often even if the exact pre-altered original image isn't available, the scanner does have a database of similar images and histogram profiles. So if the histogram for that class of image (e.g. photo of fireworks at night) is suspect, the image can be flagged as possibly altered.

Of course the scanner doesn't have to be passive if the scanner has the authority to alter images! If it can alter images, it can simply mangle the least significant bit range to destroy any steganographic data without needing to know if there was any.

Even saving from one lossy format to another can mangle the least significant bit range. This is in fact the problem that "secret" digital watermarks have versus visible digital watermarks.

LateralFractal
  • 5,143
  • 18
  • 41
  • 5
    Note that steganography, as a concept of hiding information, can be simply more low tech. Such as using a [tool](http://entropymine.com/jason/tweakpng/) to add custom image meta-data tags. Or the image is a birthday party photo with a pink toy balloon in the background meaning *"drug delivery at the docks"* and a blue toy balloon meaning *"drug delivery at the farmhouse"* – LateralFractal Oct 22 '13 at 10:18
7

The whole point of steganography is to avoid detection. A "universal test" for steganography would be a constructive proof that steganography is not possible. However, no proof (constructive or not) of the impossibility of steganography is currently known. This implies that no "universal test" for steganography is currently known -- and it is not known whether such a test is even possible.

We can still say a few things. For instance, steganography relies on:

  • Variability in data. There must be some room for encoding the hidden message. If the complete message format is fixed down to the last bit, then there is no way to include an extra message. That variability must not alter the apparent meaning of the data.

  • A secret convention. The recipient of the data must know that there is a hidden message, and how to find it.

Among the tools for steganography, an important one is encryption: symmetric encryption has the ability to transform arbitrary data into a sequence of bits which will have the same probability distribution as random noise. The message recipient, of course, knows the decryption key: it is part of the secret convention. Thus, a good steganography tool will begin by an encryption layer. A consequence is that if the medium is in a format which "naturally" allows for some random noise to appear (e.g. a photograph), then steganography will be possible and very hard to detect, even if the method is completely known (because nothing looks more like random noise than random noise).

A common tool against steganography is compression. The basic premise of lossy compression (as is used for media files, e.g. MP3 or JPG) is that irrelevant details can be removed from the file, where "irrelevant" means "does not alter the perceived meaning of the data". Random noise will be tracked and removed by compression. Therefore, compression tends to be at odds with steganography. If you write a filter which automatically recompresses (aggressively) all pictures sent by email, then you will not detect steganography, but you will have made it much harder. In that sense, steganography shares some characteristics with watermarking.

However, some kinds of steganography seem impossible to detect and defeat. For instance, I can use the following convention with a correspondent: next week, I will send him a binary information (a "yes/no") hidden in a photograph of a cat; the binary information will be "yes" if the cat looks to the left of the photograph, and "no" if the cat looks to the right.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 1
    Regarding compression causing failure of steganography, I once read an example from the age of the telegraph. A clerk sent an innocuous message that said something like "MOM RECEIVED A PACKAGE." The clerk became suspicious when the reply was "MESSAGE GARBLED. WAS PACKAGE RECEIVED BY MOTHER OR MOM?" When you pay for telegrams by the letter, the clerk thought it was a pointless expenditure for a meaningless distinction - therefore it may have contained a hidden meaning, obscured by the compression. – John Deters Oct 22 '13 at 21:22
5

That's really the million dollar question. The NSA and other organizations have spent lots of money trying to come up with reliable ways to recognize them. I'm not sure the current state of the art, but I know a mathematician that was actually working on analyzing steganography back in the early 2000s to try to identify a reliable way to identify what images contain messages.

I don't know what progress he made for obvious reasons, but the entire point of a steganography system is to conceal the message with as little noticeable change as possible. If the system is poorly designed, it may leave evidence behind of it's alteration, but a well done solution shouldn't leave a statistical trace that can be identified without knowing what you are looking for.

AJ Henderson
  • 41,816
  • 5
  • 63
  • 110
0

There's a TED talk addressing the issue of manipulated photos being published as originals. I can't find it at the moment.

The speaker developed a way to identify manipulated photos by comparing the pixels of a JPG image. I suspect this could be used to detect steganography. I also suspect this isn't relevant to other image types.

Steven Volckaert
  • 1,193
  • 8
  • 15
Geoff
  • 1
0

You can check this by the way, that steganography is mostly

  • not noise, but valid information (which makes this redundant)
  • need to contain some type of protection against minor changes (which makes this redundant)

For example, in case of a picture, we could see as we smaller and smaller details of the picture analyze (mostly, but not always, the least significant bits), we get noisier and noisier data. If there is some type of hidden information, we will found a sudden redundancy in this region.

peterh
  • 2,938
  • 6
  • 25
  • 31