Steganography to hide text within text

Question

Are there any steganography algorithms which are capable of hiding a (optionally encrypted) text message within another innocuous text message?

The scenario I envisage is that I would like to carry on an email conversation, which to a man in the middle looks completely innocent, even upon close scrutiny; but which infact contains the true message well hidden within the visible text.

I am aware of this technique being used to hide text within media files, however this sounds to me like it would be both cumbersome and suspicion-arousing to the man-in-the-middle if every message and its response has a media file attachment.

I don't currently have anything worth hiding and I am not doing anything illegal, but I value my privacy and I am very interested in cryptography.

An example would be:

Sender apparent message: Hi there Bob. How was your weekend? Mine was ... more text ... ciao

Sender real message: Did you find the password I requested?

Receiver apparent message: Pretty good. I caught up with ... more text ... ciao

Receiver real message: Yes, it was "password"

You may find "chaffing and winnowing" to be of interest, though it isn't precisely steganography. — Brian, Sep 21 '12 at 17:31
@Brian thanks for the lead. i read the wikipedia article but it doesn't look quite like what i'm after. there was a link to the "null cypher" though which may be closer to what i am after... — mulllhausen, Sep 24 '12 at 01:59
Related: ["Any efficient text-based steganographic schemes?"](http://crypto.stackexchange.com/questions/6058/any-efficient-text-based-steganographic-schemes). — David Cary, Dec 21 '14 at 15:31
Are you using text because of transmission channel restrictions or you think it would look more innocent to a third party? — this.josh, Apr 09 '15 at 00:45
@this.josh i think when i wrote this i was interested in using regular email (text) because it is ubiquitous. — mulllhausen, Apr 09 '15 at 05:27
@mulllhausen Attaching photos and graphics to e-mail is nearly ubiquitous. The larger the ratio of carrier data to secret data the easier it is to hide the secret data. Consider a small secret message of 20 characters. Hiding it in a message of 1200 characters gives you a ratio of 1200 to 20 or 60:1. Consider hiding the same 20 characters in a small jpeg image of 20kb. This time the ratio is 1000:1. Even HTML is a significant improvement over plain text because you can hide information in the formatting. — this.josh, Apr 09 '15 at 17:36
@this.josh of course yes but its really obvious to an eavesdropper when something like a signature jpg (which looks the same to the human eye) has a different hash every email. its a giveaway. a visually different photo each time would not arouse the same suspicion though and this would be a good way to do it. — mulllhausen, Apr 09 '15 at 23:09
See also neural linguistic steganography: * paper: https://arxiv.org/pdf/1909.01496.pdf * demo: https://steganography.live/ * tool: https://github.com/harvardnlp/NeuralSteganography — Martin Monperrus, Nov 18 '20 at 06:50

score 22 · Accepted Answer · answered Sep 20 '12 at 07:20

22

Yes, there exists algorithms that hide messages inside messages that can look quite innocent. Take for instance spammimic. It gives the possibility to hide your message inside a typical looking spam message.

A google search for "Steganography hiding text in text" gives you more research and examples around this.

answered Sep 20 '12 at 07:20

Chris Dale

16,119
10
56
97

3

Yeah it's a nice service, but the message is really long for such a short text to hide. – george_h Sep 22 '12 at 09:21
@george_h It's trying to hide information in a very low information-dense medium, so the overhead is necessarily going to be pretty high. If it's trying to generate natural-looking passages then there's going to be even more overhead. – user Dec 07 '20 at 17:19
@george_h Have you heard of https://stegcloak.surge.sh/? It creates a very short message, like you would see in a text message. – jastako Jul 28 '21 at 00:26
@Chris Dale: You actually answered a question I just asked a few hours ago. https://security.stackexchange.com/q/252715/245057 – jastako Jul 28 '21 at 00:30

Mok-Kong Shen · Answer 2 · 2017-05-03T20:27:39.507

My personal (maybe biased though) opinion is that spammimic isn't very "natural". A humble attempt of mine is to use the number of words in a line of emails or similar text documents, e.g. HTML source files, where one normally doesn't care too much about the ruggedness of the line ends, to transmit one stego bit. A Python code to help do that formatting is available under the name EMAILSTEGANO. Its bit rate is of course unfortunately very low. On the other hand occasionally very short stego messages could be sufficient for one's purposes (e.g. when an appropriately built codebook could be employed to express the informations to be transmitted in highly compressed forms). Note that for hand-written texts, the said problem of more or less unsatisfactory ruggedness of the line ends may even completely disappear, if corresponding care is taken in writing.

[Addendum, edited] I have now a different scheme WORDLISTTEXTSTEGANOGRAPHY (employing an extensive word list) which has a higher bit rate, albeit requiring the user to compose the covertexts under the guidance of the software. Both schemes mentioned are in the most recent versions accessible from my home page mok-kong-shen.de

i reckon it would be a fun project to extend your idea into a more efficient form. for example you could use the number of words, the number of capitals, the number of "accidental" double spaces, etc to encode far more than one bit per line. — mulllhausen, Sep 24 '12 at 02:11
leetspeak may also provide a steganography approach of this type — mulllhausen, Sep 24 '12 at 04:56
@mulllhausen: Simulating typos, I guess, may not be very simple to automate and, anyway, if not used very sparely, could raise the suspicion of the warden. — Mok-Kong Shen, Sep 24 '12 at 08:19

score 3 · Answer 3 · answered Feb 22 '17 at 16:42

Matthew Kwan has developed a way to hide text in the spaces and tabs of a plain old text file (.txt) It is called snow and available here: http://www.darkside.com.au/snow/ It is a Windows-centric portable utility with no external dependancies, meaning nothing to install. Optionally one may encrypt (ICE algorithm) the hidden text to further obfuscate the hidden text. I have not tried running in on Linux with Wine, in (theory) it should work.

score 3 · Answer 4 · edited Dec 07 '20 at 19:51

I did try something like this once just for the challenge in 2010. News of current disasters hidden in 1st Monasterians (download code) contains a reformatted version of a piece of text written by someone else years ago, together with code to extract a hidden message from it.

I chose this specific older text because it is one of the highest voted posts on PerlMonks, so it was recognizable by at least the regular visitors of the site, and was also over eight years old at the time.

The goal in this case wasn't to hide information, but to parody people who purport to find hidden messages in old sacred texts. The same technique would still work if you wanted to hide information, though obviously in that case you wouldn't publish the decoding program together with the text.

The secret message is hidden in the number of spaces between the adjacent words of the text. To make this naturally variable, I made the text pre-formatted as two justified narrow columns. (The original post also has the text formatted to two columns, but there it's not pre-formatted but wrapped on client side.) As the text is justified, that is, stretched to the full length of a line, I have to distribute a number of spaces approximately equally between the words of the line. Where the number of words minus one does not divide the number of spaces to distribute, I have freedom to put one more space between certain words, and can use that choice to encode information. I also have some freedom to choose where I break lines, and used this freedom to ensure that I get the chance to hide information in this way somewhat more often than what you'd get from the most convenient line breaks, but tried not to make it look so uneven as to look suspicious. This method allows hiding only a very small amount of information: here I hide 7 words in a 420 word long text.

The technique isn't perfect, you could do it more professionally. But I made this 10 years ago, and didn't want the decoder to be too long, so you can probably excuse that.

That said, I think hiding information in plain text is rarely useful on the internet. There's always so much metadata exchanged that it's almost always easier to hide data invisibly. You mention email conversations; I get HTML emails at work pretty regularly, they look innocuous and it's easier to hide anything in them then in plain text.

score 3 · Answer 5 · answered Mar 10 '13 at 22:57

I have a brilliant example for you! I've recently seen ONE application of steganography being used to hide a text message within a text document.

There is a National Geographic video on YouTube regarding the Aryan Brotherhood and how they use to communicate while in prison, across the nation. The gang was created inside a maximum security prison in California, and managed from other super max prisons. They are the most violent gang in prison and while only making up 1/10 of 1% of the population are responsible for more than 20% of the murders that take place within the prisons.

The steganographic technology that they employed was a bi-literal cipher developed 400 years ago by Sir Francis Bacon and was broken by a multi-jurisdictional federal organization including experts at the FBI, NSA and other orgs. Naturally, you cannot use this technology since it has been broken, but some of the logic behind it is still solid.

You really need to see the video if you do not understand what I am stating here. As stated, the texts are meshed together. In this technology, one "alphabet" is written in plain block letters, and the other "alphabet" is written in cursive. The plain block letters become As, and the cursive letters become Bs. Then the letters are arranged in groups of five, and they must then be deciphered using a key.

I know this sounds easy to break should it be posted on the Internet but there are some very close fonts that may make this a capable technology unless every document is poured over, and any document may contain dozens of fonts. One way to hide the font changes would be to place the different fonts in a PDF document, or image, and a special technology would be necessary to extract the different fonts, something which is not common with most OCR software.

it sounds like a good technique. unicode equivalent letters might also work in a similar fashion as the block vs cursive in their cipher. — mulllhausen, Nov 26 '13 at 00:25
except if the fonts are pretty similar, any kind of "binary alphabet" (bolds letters usually, etc) usually scream that it's a Bacon cipher — Dillinur, Jan 28 '15 at 12:40

score 2 · Answer 6 · answered Sep 25 '12 at 16:33

Steganographical methods of the kind mentioned in my previous answer are commonly termed syntactic ones. I like to mention another syntactic method, due to Rhinedahl, which can be explained as follows:

Let the stego message be a string where each character is coded as 5 bits. One attempts to write for each set of 5 bits a sentence for the cover text according to a rule e.g.:

1st bit = number of noun phrases in the sentence modulo 2.

2nd bit = number of adjectives modulo 2.

3rd bit = number of adverbs modulo 2.

4th bit = number of clauses modulo 2.

5th bit = was the main verb transitive (=1) or intransitive (=0)?

This obviously has a much higher stego bit rate than my humble scheme EMAILSTEGANO. The method is in fact not too difficult for manual work. However, it is not feasible to fully automate it. The best one could do would IMHO be a software that employs AI techniques (NLP) to determine the required grammatical informations from given sentences and that provides good interactions with the user to deal with the actual issues of stego encoding.

how come you have chose 5 bits? for a more secure transmission i would probably want to send the output from a command like `gpg -ca file.txt` hidden steganographically within the text. since this output is ascii and usually contains characters like '+', '=' and '/' then i'm guessing 8 bits would be needed per character (since ascii is 0-255). it seems to me like efficiency (measured as something like the ratio of hidden characters to visible sentences) is going to be the main challenge. — mulllhausen, Sep 26 '12 at 05:06
The 5-bit coding corresponds to cases e.g. in classical crypto where the text is confined to be in an alphabet of 26 characters and without spaces. In the general case you simply have a bit string of total length n. Then you could just take 5 bit groups from it. Anyway the example rule I gave serves to illustrate the basic idea. I suppose one could modify or generalize it to better suit one's particular needs. — Mok-Kong Shen, Sep 26 '12 at 20:14

score 2 · Answer 7 · edited Apr 13 '17 at 12:48

I've developed a scheme here:

http://mjethani.github.io/typo

Here's a brief explanation of how it works:

https://crypto.stackexchange.com/a/24863/15220

In a nutshell, every 4 bits of the secret message is encoded as a typo in the stegotext. The value of the typo is the 4 least significant bits of the first byte of its SHA-256 hash. For example, the typo "infirmation" (information) carries the value 0xE (0b1110). The recipient simply identifies the typos and hashes them to extract the information.

Steganography to hide text within text

7 Answers7