0

First let me emphasize that I know nothing about encryption and maybe my question is completely wrong from encryption point of view.

Since I must use a simple substitution for a technical reason and I know that Caesar cipher is not secured and can be hacked easily and I was looking for a way to make it more secure for short messages (up to 500 words)


My suggestion

1) instead of using the 26 characters in English I’ll use the 1,111,998 characters in Unicode.

2) some characters will select in advance and will never be used in the encrypted message. The selected characters will be inserted as a distractions in the encrypted message (the receiver will know to ignore them).

3) the order of the characters will also be changed randomly and the receiver will have to do the job of reordering them.

4) instead of just shifting characters I’ll create a map of random relations between characters

Example:

Message: “I Love You”

Distraction characters: “o”

Mapping (for the simplicity of the example I mapped to order English characters): “I” = K, “ “ = U, “L” = R, “V” = M, “E” = T, “Y” = Z, “U” = A

Encrypted message: MOTUOKOURAZ (after removing the "O"s, mapping and rearrange the chars)

Decryption:

The receiver will discard the “O”s: MTUKURAZ

The receiver will translate using the map: ve i luy

The receiver will scramble the letters randomly until the message makes sense (I know this is sounds like a brute force, but in my case this is fine): I lve yu

The receiver will add the missing “O”: I love you


Why inventing the wheel? (Why shouldn't we roll our own?)

I have a case when I can only switch characters and unable to do something fancy (like ASE) and I still want it to be secure. So my question is not if there is something better, but is this secure?


Why i think this is solves the Caesar cipher weaknesses?

You can’t do a langue statistics attack since the letters are scrambled.

You can’t do a letters statistics attack since we have a distraction letters.

And the fact that we have a map and not a shit and over a million letters instead of 26 make every “guess” of one letter almost useless for the others (specially in short messages).

Thanks!

Omri
  • 103
  • 6
  • 3
    Possible duplicate of [Why shouldn't we roll our own?](http://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own). And please have a look of why caesar is insecure. The same techniques to break caesar can be applied in your case (i.e. statistics). It might be a bit harder to break than caesar but it is very far away from being secure. – Steffen Ullrich Nov 12 '16 at 09:18
  • I’ve updated my question with why i think this is solved the Caesar cipher weaknesses. I may be wrong, but I would like know why. – Omri Nov 12 '16 at 09:35
  • 3
    "I must use a simple substitution for a technical reason" - What reason would that be? Are you looking for format-preserving encryption? – Arminius Nov 12 '16 at 10:35
  • With your condition 3), how is the order of characters changed randomly? You do a shuffling of the "1,111,998 characters in Unicode" for (a) each session of communication depending on a session key, or (b) dynamically within a session depending on some events of the encrpytion processing? How is the shuffling done? – Mok-Kong Shen Nov 12 '16 at 11:05

2 Answers2

1

Your scheme is both not feasible and not secure.

1) instead of using the 26 characters in English I’ll use the 1,111,998 characters in Unicode.

What does this change in terms of security? It won't change the fact that most characters in the messages will be A-Z.

2) some characters will select in advance and will never be used in the encrypted message. The selected characters will be inserted as a distractions in the encrypted message (the receiver will know to ignore them). ... The receiver will add the missing “O”:

That just won't work. There are infinite possible messages you could generate by adding a number of characters at some places, and the computer is not intelligent. While a hash could help a bit, I guess with your restrictions you can't have one (and this is out of the scope of a pure "encryption" anyways).

The receiver will scramble the letters randomly until the message makes sense (I know this is sounds like a brute force, but in my case this is fine):

Again, not possible with some checking method. And if you have one, eg. 20 characters are 2432902008176640000 possibilites. Not fine.

You can’t do a letters statistics attack since we have a distraction letters.

So what? This does not change much.

And the fact that we have a map and not a shit and over a million letters instead of 26 make every “guess” of one letter almost useless for the others (specially in short messages).

Again, it won't change the fact that most characters in the messages will be A-Z.

deviantfan
  • 3,854
  • 21
  • 22
  • 1) no, since I have more than a million letters to use, every letter in English will be mapped to more than one letter in Unicode an therefor the frequency is broken. Am I wrong? – Omri Nov 12 '16 at 17:38
1

Caesar cipher in this context assumes there are only 26 possible characters in a message.

Below given are the porblems with your algorithm.

  • With statistical analysis, the distraction character(s) can be identified.
  • Frequency analysis of occurrence would be useful in this.

  • The re ordering logic needs to be transferred between sender and receiver.

  • If recipient has to bruteforce the reordering logic, an attacker as well can do the same.

However your decision to create a random mapping rather than shifts comes close to a perfect cipher known as One Time Pad or Vernaum Cipher.

Let me use the same example you used. I will be encrypting ILOVEYOU with the key WHATISIT

ILOVEYOU

WHATISIT

Here nth letter of the key denotes what is the amount of shift I have to do in the Caesar wheel for nth letter of message.

For example in the key A means no shift, B means shift 1... and Z means shift 25.

For example, the first letter of the key is W. So in the cipher text the first letter of message will be shifted by 22. Shifting I by 22 will give us E. Similarly the whole message will be

ESOOMQWN

The recipient who knows the key can decrypt the message by performing the shift in opposite direction with the same key.

The advantage here compared to your algorithm is that there is no hardcoded mapping required, each letter of the message uses a separate cipher. Statistical analysis is impossible if

  • The key is as long as the message.
  • The key is a randomly generated stream of letters.
hax
  • 3,851
  • 1
  • 16
  • 34
  • ...BUT: An OTP key may not be used more than once (because then, cracking is trivial). – deviantfan Nov 12 '16 at 17:28
  • @deviantfan One Time Pad is a stream cipher. Both the data to be encrypted as well as key are streams. So 'using the key twice' is out of question. The drawback is key length which is as big as the data and the method which can be used to share the key between two entities. In case of Vigenère cipher which is a stripped down version of Vernaum cipher, your concern in valid as the key is repeated. – hax Nov 12 '16 at 17:32
  • `So 'using the key twice' is out of question` Yes, that's what I said. `In case of Vigenère cipher ... your concern in valid` What concern? – deviantfan Nov 12 '16 at 17:35
  • First of all – thank you for the informative answer! Can you please explain why my algorithm is open to statistical analysis or frequency analysis? I failed to understand that... the random mapping and the “distraction” letters will not prevent it? If not, why? – Omri Nov 12 '16 at 17:36
  • 1
    @deviantfan We both are saying the same thing :) – hax Nov 12 '16 at 17:37
  • 1
    @Omri Random mapping first: AB has 50% A and 50% B. BA has 50% A and 50% B. No difference. About distraction letters: AOB has 33% of each, still the same number of A and B. That's all what matters. Having 33% O is a bit confusing, but no showstopper. – deviantfan Nov 12 '16 at 17:39
  • 1
    @omri Suppose your message is entirely in English. Not all the letters of english alphabet are used in the same frequency. For example vowels are predominant than consonants. So if you have a 500 word passage to encrypt, it is safe to assume that the most repeated letter in the passage corresponds to the most used alphabet E. Suppose you have mapped E to K in your algorithm. Now I know where all you have K in the passage. Similarly the least used letter Z. https://en.wikipedia.org/wiki/Letter_frequency – hax Nov 12 '16 at 17:41
  • @hax - since I have more than a million letters to use, every letter in English will be mapped to more than one letter in Unicode an therefor the frequency is broken (I have 47,000 letters to use per 1 english letter). Am I wrong? – Omri Nov 12 '16 at 17:42
  • I think that i understand, so basically, in my algorithm i have 47,000 OTPs. and if i send more than 47,000 messages I have to repeat myself. Right? – Omri Nov 12 '16 at 17:45
  • @omri In that case I am unsure how you will be managing the decryption at recipient end due to the one to many mapping. The protection against frequency analysis would be proportional to how many elements of set Unicode are you mapping each element of set Alphabet with and the strength of randomisation associated in that mapping. – hax Nov 12 '16 at 17:46
  • @Omri Unicode mapping: Ok, with this (new/changed) property it's harder ... let me think a bit. OTP: No, OTP may not ever repeat. This is one of the many reasons it is not really used in practice. – deviantfan Nov 12 '16 at 17:47
  • @omri It is not equivalent to 47000 OTPs. It would be much less than that. ie; the number of mappings for each letter. – hax Nov 12 '16 at 17:48
  • Yes you are right, I understand my mistake. I developed (without knowing) some sort of Vernam cipher. But I use it more than once… and there for it is week. Also the distraction letters may break frequency, they won’t break statistical over time. – Omri Nov 12 '16 at 17:52
  • 1
    And to add, Vernaum ciphers are seldom used practical life due to the very large stream of key required. If it is for actual application, you may use any other stream or block cipher. – hax Nov 12 '16 at 17:54