Which encryption algorithm allows for the less output data than source data?

Question

I am trying to figure out how to get my source information to compile smaller using encrypted text. This could potentially change the game in transferring large-chunked data and offer security at the same time.

Are there any encryption types that change source to fewer letters and numbers?

(for lack of a better word i am using the word:encryption) (this question is not asking about encryption and compression, its asking about what encryption type can change source to fewer string letters/numbers)

That's called compression, and it's a completely separate thing from encryption. — AndrolGenhald, Oct 25 '18 at 16:09
@DeerSpotter - You literally can't. Encrypted data is meant to be indistinguishable from random data and random data is provably uncompressable. If you could compress random data, there'd be nothing from stopping you from compressing the compressed output to get an even smaller file still. Repeat it enough and you'll end up with a 1-byte file which clearly isn't enough data to represent the original input. — Mr. Llama, Oct 25 '18 at 16:51
yes, if you compress then encrypt, it's possible to get a _final_ output shorter than the compression's input. — dandavis, Oct 25 '18 at 17:04
@AndrolGenhald Actually, it's not "completely separate" from encryption. Good encryption systems often compress the plaintext prior to encrypting. — Monty Harder, Oct 25 '18 at 17:15
@MontyHarder At a high level an encryption program may do both, but they are still separate processes that happen independently. In fact, encryption that always compresses is a bad idea, as it creates a weakness if a chosen plaintext is encrypted alongside a secret (e.g. CRIME & BREACH). — AndrolGenhald, Oct 25 '18 at 17:32
None. Simply: Encryption does not compress data. Encrypted data does not compress. The best you can do is first compress the data and then encrypt the compressed data. For why read the answers. — zaph, Oct 25 '18 at 17:41
@JoshTownzen Yeah, kind of. Although I'm not sure if my answer would fit there. The pigeonhole principle operates on a higher level, and you can also look for more efficient ways of encoding data rather than performing compression... — Maarten Bodewes, Oct 25 '18 at 17:45
@AndrolGenhald The fact that compression can introduce a weakness actually strengthens my view that they are not "completely separate". — Monty Harder, Oct 25 '18 at 19:02
This is nowhere remotely close to being a duplicate. I think people are bored on this site. @Rory Alsop — DeerSpotter, Oct 30 '18 at 16:00
Your most recent edit suggests that you're talking about compiling source into a form which can be recovered fully? Note that for any general purpose encryption scheme, there will be some inputs which result in larger outputs - this doesn't apply to encryption which only applies to specific inputs (e.g. "compression" which removes all spaces from text with spaces, and rejects any text without spaces, will always produce shorter output than input) — Matthew, Oct 30 '18 at 16:31
@Matthew i understand how compression works, Do you remember as kids we would encrypt messages before sending to our friends for challenges? Or posting random encrypted message on facebook hoping somebody out there replies. I am also shouting to the world hoping for a random response that will trigger my brain for a solution. i need to figure out a step or process that through encryption/coding changing into another language etc that will change the source to fewer letters/numbers. — DeerSpotter, Oct 31 '18 at 16:07

DarkMatter · Answer 1 · 2018-10-25T16:21:16.683

16

Encrypted data should be indistinguishable from random noise. Random data cannot be compressed. Therefore, compress data first and then encrypt it.

edited Oct 25 '18 at 16:21

answered Oct 25 '18 at 16:09

DarkMatter

2,671
2
5
23

i want to encrypt it first, seems straight forward question? – DeerSpotter Oct 25 '18 at 16:11
9

@DeerSpotter Encryption produces a high entropy output, making compression after encryption next to useless. The answer is correct. – Daisetsu Oct 25 '18 at 16:15
4

@DeerSpotter Yes, pretty straightforward. Why are you even asking this? Just `compress(encrypt(text))` and be done. It's not at all a problem to implement. Just don't come back asking why the output isn't compressed in any significant amount independently of what `compress` and `encrypt` you choose. – Bakuriu Oct 25 '18 at 17:08
1

Information-technically speaking there may still some be redundancy in the output message space as long as it doesn't leak information about the input message. However that redundancy will - by definition - be less than the redundancy in the input message, and the output of practical symmetric ciphers *is* fully indistinguishable from random. Long story short, this is largely correct :) – Maarten Bodewes Oct 25 '18 at 17:37
Compressing minimally or non compressible data can make it larger due to overhead depending on how good the compression method is. – zaph Oct 25 '18 at 17:40
1

@Daisetsu The output is not high-entropy, it is high-complexity. – forest Oct 27 '18 at 11:44
@forest I should have been more specific. I was referring to Shannon entropy. Checking the Shannon entropy and finding a value close to 8 has been a common test for checking if files are encrypted. What do you mean by by high-complexity. I googled it a bit but cut couldn't find what you meant. – Daisetsu Oct 27 '18 at 14:10
1

@Daisetsu The Shannon entropy is not high either. The entropy cannot be more than that of the original message plus the key. "Testing" for Shannon entropy is a misnomer anyway, since there's no way to test for it (such tests think the binary expansion of Pi is high-entropy, even though it is not). – forest Oct 27 '18 at 14:18

Maarten Bodewes · Answer 2 · 2020-05-10T01:07:52.723

No, what you are trying to accomplish is impossible.

Encryption tries to keep information confidential. For this it takes one input message out of all possible input messages and encrypts in such a way that the output doesn't leak any information about the input message. Obviously you do not want that messages encrypt to the same output because in that case you would not be able to retrieve one message, or you would not be able to choose which message was encrypted.

Now if the output would be smaller than the input then by definition there would be fewer output values than input messages. In that case there must be messages mapped to an output value that has already been assigned to another input message. This is called the Pigeonhole principle and it is explained in most cryptography primers.

So the best you can do is to break even.

Even breaking even would get you into trouble if you try to reuse the key. The problem is that a repeated message would encrypt to the same output, leaking information to an attacker that the messages are identical. For this most ciphers require an IV as input. Sometimes this IV can be generated during encryption and decryption (sector numbers for hard drive encryption) but often the IV needs to be stored next to the ciphertext.

And then you would still have a message that is confidential, but it is not integrity protected or authenticated. In general you would need to add an authentication tag as well. A common way to do this is to calculate a MAC value using HMAC.

For these reasons, encryption will often expand the size of the message rather than reduce it.

So the only thing you can do is to reduce the input message space. This can be done by performing compression. But you could also try and find a better way of encoding the information in the input message. For instance, XML has a lot of overhead, which could be removed by using a binary encoding such as ASN.1 with DER or even the weird but highly efficient PER encoding rules.

Beware though that ciphertext size may also leak information to an attacker. For instance the bit rate of an variable rate MP3 stream could say a lot about the contents of the stream, which would be hidden if a constant rate stream / raw wave format would be streamed.

Finally I would like you to point to Format Preserving Encryption or FPE. FPE can be used to provide encryption of messages where the output message space is exactly as large as the input message space, even if the input message space is not a power of two (i.e. can be encoded simply as a sequence of bits). This method is often used to encrypt information such as credit card numbers, where the input message is certain to never repeat.

So FPE is a provably secure way of breaking even (assuming that the underlying primitives are secure), where modern ciphers often operate on bits, bytes or complete blocks (multiple bytes) of plaintext at a time.

your explanation is what i was looking for, hence why now i only want to have a conversation with you. The purpose of my question is less focused on security and more on the basic can i get a string of 32 values into 12 values (it doesnt matter how it was done as long as i can then build a tool that says how to read that data) i am looking for something really outside the box in thinking. ludicrous i say! — DeerSpotter, Oct 30 '18 at 16:10
@DeerSpotter - if that is what you are trying to do then this is definitely off topic here. Please don't keep changing your question - currently the answer is very obvious. If you have a new, security related question, please ask it separately. — Rory Alsop, Nov 01 '18 at 08:55

gbr · Accepted Answer · 2018-10-26T20:44:00.860

Codes can do it: they use lookup tables ("Codebooks") which map something (words, phrases or arbitrary byte sequences) to other symbols, and the latter can occupy less size than the former, thus achieving (possibly only with some source messages) a compression.

The communicating parties need to already be sharing the codebook in order to able to communicate, though.

The size of a single "encrypted" message + the codebook's one will always be bigger than that of the plaintext, but with enough messages you could compensate that.

In any case, the codebook needs to be communicated in a secure way (e.g. by hand), so their size matters only in that they might take up a lot of space in the parties' storage systems.

The historic ones are probably all easily breakable, but with the current technology one might be able to come up with a secure scheme, probably using a long-enough minimum source sequence length.

If you want a scheme that supports any possible source message you will probably need to use a very large (possibly unwieldy) "codebook".
You'll need to use a different one for each group of parties that shouldn't be able to decode the other groups' messages, and you'll most likely need to change them regularly.

At this point, you should probably head to crypto.stackexchange.com with your question, the people there will be much more competent at telling you if a similar scheme could be secure and what caveats apply to it.

I'd better emphasize here not to try to come up with your encryption scheme, or at least not to use/rely on it for anything until it has been analyzed extensively, for a long time, by (many) real cryptographers.
If you're not knowledgeable of the field you'd be surprised of how easy is to shoot in your foot and make a trivially-breakable encryption system, however incredibly complex that may look to you.

This does not answer the question; *"Which encryption algorithm allows for the less output data than source data? "*. It is not clear if you are using the codebook prior to encryption in which case it is not encryption that decreases the message size which the OP is asking for. It is unclear how a codebook could decrease the size after encryption. — zaph, Oct 27 '18 at 09:35
@deerspotter - Stack Exchange is not like discussion forums. Please go and read the [about] page — Rory Alsop, Nov 01 '18 at 08:59

Which encryption algorithm allows for the less output data than source data?

3 Answers3