What string encoding options are there besides xor?

Question

When attempting to obfuscate strings in a modern program, xor is probably the most common option. By this I mean running each char of a string through a function which xors the char with some given number. This is popular because when the "cipher text" is then xor'd with the same number, it is restored to the plaintext.

I'd like to know other mathematical options that I have which can be used to encode a string which do not use the xor operation. Are there other binary math operators such as AND or OR which can be used in place of xor? How about shift left, shift right, rotate, etc...? Can XOR be broken down into other atomic operations? Please give an example.

There is an unlimited number of ways to encode a string for obfuscation, like swapping characters (all 'a' get 'i', all 'i' get 'a'), url-encoding, base64, base32, hex, quoted-printable, shifting the string with some bits or bytes, rotating the bits or bytes in a string ... — Steffen Ullrich, Mar 14 '19 at 06:49
Steffen is right, there are a million ways you could do this. Your question is rather broad as an almost infinite amount of different correct answers could be given. What exactly is your goal? Is it just a curiosity question, or do you have a certain purpose for the answer? — Luc, Mar 14 '19 at 08:55
Consider using several steps, i.e. swapping + XOR, you can complicate things as you want. However if you're looking to really secure some information you should look into using encryption instead. Also XOR is an atomic operation, it cannot be broken down into other operations although you can combine other atomic operations like AND and OR to achieve the same result. — m0skit0, Mar 14 '19 at 11:53
on a low-level, you can use modular addition instead of XOR. — dandavis, Mar 14 '19 at 16:44
@dandavis I'd like to explore this more. Would you be able to write an Answer with an example or some other way of demonstrating this? That would be very helpful! — the_endian, Mar 14 '19 at 16:54

score 2 · Accepted Answer · answered Mar 14 '19 at 18:45

Given that you specifically want to encode individual chars as individual chars (i.e. 8 bits converted to 8 bits), the only requirement that you have is that your encoding function is a bijection -- that is that it never maps two input characters to the same encoded character. As long as you maintain this requirement, you can always calculate an inverse function which restores the original characters.

XOR is one such bijection. An addition (modulo 256) is another bijection. Swapping high and low order nybbles (4 bits) is another option. Swapping every other bit is an option. Any one of these will suffice.

In fact, one can trivially prove that there are precisely

857817775342 842654119082 271681232625 157781520279 485619859655 650377269452 553147589377 440291360451 408450375885 342336584306 157196834693 696475322289 288497426025 679637332563 368786442675 207626794560 187968867971 521143307702 077526646451 464709187326 100832876325 702818980773 671781454170 250523018608 495319068138 257481070252 817559459476 987034665712 738139286205 234756808218 860701203611 083152093501 947437109101 726968262861 606263662435 022840944191 408424615936 000000000000 000000000000 000000000000 000000000000 000000000000 000

possible ways to encode a character this way, which is 256!. If you exclude the possibility of encoding all characters to themselves, subtract one from this number.

XOR and addition have a particular advantage that they are almost always hardware accelerated -- CPUs can do them in one cycle, with one instruction. This makes them fast and easy. Some CPUs also have a "barrel shift" operator which does a shift, wrapping the bits around to the other side, so on those CPUs you could also use a shift efficiently.

XOR is the most popular for many reasons. It's trivial to understand at a bit level, and has a convenient property that encoding and decoding are precisely the same. It's also technically keyed. While unsigned addition of 128 also encodes/decodes with the same instruction, only one such number works that way.

In the end, XOR is also popular because nobody really cares all that much. If one is merely obsfucating content lightly like this, there's no real advantage to being creative. You go with what is easy. XOR shows up in all the examples, so XOR is what people tend to use. Thus people tend to make examples using XOR. With no real advantage to doing better, XOR kind of wins the day, thanks to that feedback loop.

Is there a particular text or area of math to study in order to learn more about bijective functions in particular? — the_endian, Mar 15 '19 at 03:13
I'm not sure, other than perhaps just going to the wikipedia page on [bijection](https://en.wikipedia.org/wiki/Bijection), or perhaps [permutation](https://en.wikipedia.org/wiki/Permutation). A permutation is a bijection from a set to itself (such as "8 bit binary number" input and "8 bit binary number" output). That's where that god-awefully long number came from. The study of permutations is how you show that the number of possible encodings is 256! — Cort Ammon, Mar 15 '19 at 04:30
If you want to look at operations you can do on such permutations to change one into another (such as "what happens if I do `(Msg XOR X) + Y` instead of just `Msg XOR X`), you could look at [quasigroups](https://en.wikipedia.org/wiki/Quasigroup) and loops. Quasigroups do have some traction in the cryptography community, but they are otherwise not all that popular. They are what mathematicians call "non associative algebras," which means you can't assume (XY)Z=X(YZ). I find that, as a general rule, mathematicians break out in hives when this happens, so they haven't gotten much attention. — Cort Ammon, Mar 15 '19 at 04:35

score 1 · Answer 2 · answered Mar 14 '19 at 09:44

If you are looking for just basic math operators, there isn't one that can replace XOR. Idea behind XOR is :

(Text) ⊕ (Key) = (String)
(String) ⊕ (Key) = (Text)

Same idea does not hold good for other operators like AND /OR. Like:

(Text) + (Key) =(String)
(String) + (Key) != (Text)

Like others have told there are other complex encoders like BASE64,URL encoding etc. that can still be used. This page : http://php.net/manual/en/function.mb-list-encodings.php has got some decent list of encodings available.

The PHP manual list is mostly of character encoding. They are not meant to be used for obfuscation. — Anders, Mar 14 '19 at 09:59

dandavis · Answer 3 · 2019-03-14T17:40:26.210

You can use addition instead of XOR. This may be faster on some platforms, or it might be slower. It is slightly more complicated to code, although not by much. Here is an example of such math in basic JavaScript:

plain  = [11, 2, 13, 4,  20];
padding =[5, 16, 12, 13, 19];
cipher = [];
decoded= [];

// build cipher:
for(i=0; i<5; i++) cipher[i] = (plain[i] + padding[i]) % 26 ;

// build decoded:
for(i=0; i<5; i++)  decoded[i] = ((26 - padding[i]) + cipher[i]) % 26;

If you dump out the arrays, you will see that decoded is the same as plain:

plain   11  2   13  4   20
padding 5   16  12  13  19
cipher  16  18  25  17  13
decoded 11  2   13  4   20

You would probably want to convert the mod cap from 26 to the number of possible letters, or use bytes (256), and you'll want to convert the hard-coded 5 to the length of the message. I tried to keep it as psudeo-code-looking as possible just to show the basic operation without language particulars.

The padding is more like a one tine pad and depending on the value ranges a modulus may not be needed if the sum of plain and padding values are less than the maximum value of the array values. — zaph, Mar 14 '19 at 22:13

What string encoding options are there besides xor?

3 Answers3