I'm implementing the AES block cipher, which reads/writes data in 16 byte blocks. The implementation I'm working with usually read data in the little endian format. But in my platform the endianness I'm using is "network order" big endian. Can I use BE for the encryption algorithm? Does the endianness usually change the way the algorithm works? Does it matter at all? I assumed it does not, since it could have simply been different data that you are reading.
3 Answers
Firstly, let's make sure we get endianness:
// needs a C99 compiler like gcc. Will work with msvc as well.
#include <stdint.h>
#include <stdio.h>
union endian_test
{
struct
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
};
uint32_t x;
};
int main(int argc, char** argv)
{
union endian_test e;
e.x = 0xAABBCCDD;
printf("%x %x %x %x", e.a, e.b, e.c, e.d);
}
If your system is little endian, you'll get DD CC BB AA as the output and big endian will give you the opposite. All we're doing here is allowing the processor the opportunity to represent an integer in whatever endianness it uses, then using a little C trick to output the bytes in the order they are represented in memory.
AES works on 128-bit blocks, which is kind of convenient, really, since if you multiply all the field sizes in that union by four, you'd get an AES input block.
So now your question boils down to:
If I were to reverse the order of the input of my block, would that create a security problem?
Well, that depends on the output. Let's assume something naive, like reversing the order of the input produces a reversed ciphertext, so if the input is AABBCCDD that encrypts to 12345678 and DDCCBBAA encrypts to 78563412.
This is a problem for a certain class of cryptographic vulnerability called CCA2, or a chosen adaptive ciphertext attack. As a result of the fact that the ciphertext is malleable you can then request various decryption operations be performed and whilst you do not know the plaintext, you would be able to deduce something about it. You might like to read a little on ciphertext indistinguishability.
As such, a cryptographic scheme should not be affected by endianness. If it is, or is malleable, it has some security concerns which may need to be addressed (and are, in some implementations, for example, by not providing a decryption oracle).
The AES is defined as operating on 16-byte blocks. Such a block is an ordered sequence of bytes: there is a first byte, a second byte, and so on, until the sixteenth byte. We often say that the first byte is leftmost and the last byte rightmost because we are westerners who use a latin alphabet and write left-to-right, and we just blindly and implicitly assume that people who write right-to-left are just wrong.
Endianness is about interpreting a sequence of bytes as an integer value; for instance, a sequence of four bytes, interpreted as an integer between 0 and 4294967295. To do such an interpretation, we decide than one of the bytes is for the units (we multiply its numerical value by 1), another is to be multiplied by 256, another by 65536 (which is 256*256), and yet another by 16777216 (i.e. 256*256*256). Little-endian convention is when the byte for units comes first, followed by the byte of value 256, then the byte of value 65536, and finally the byte of value 16777216. Big-endian convention is when the bytes are in the opposite order.
Within AES, there is no interpretation of sequences of bytes into larger integers, so endianness does not apply. However, even for algorithms which do involve interpretation of byte sequences into integers, the algorithm specification defines the endianness to use. It is then up to the implementation to follow that defined convention, whether it matches what the CPU does best or not. For instance, in the hash function MD5, the input message (a sequence of bytes), after padding, is split into 64-byte blocks, and each block is split into 16 4-byte sequences, and each sequence is interpreted as an integer with little-endian convention. MD5 is MD5; it is deterministic and its output for a given input file should not depend on whether the computer uses a x86, ARM, PowerPC or Sparc CPU. Therefore, implementations which run on big-endian CPU must include the necessary byte-swapping steps to do the computation correctly.
Three additional points:
The importance of endianness for performance is overrated. Even for MD5, which is very fast (on my computer, it processes more than 400 megabytes per second), the byteswap for big-endian CPU implies an overhead of no more than 15%. This is why the usual SHA-1 / SHA-256 hash functions can use big-endian convention (that's how they were defined), contrary to what a basic PC does natively, and it still not a big issue.
If your C code must be made aware of endianness, then it does things wrong. C code which is not endian neutral is C code which accesses the same memory objects as both bunches of bytes and integers. Such code breaks strict aliasing rules, which means that the code may cease to operate properly when compiled with a different compiler version or with different compiler options. This is sloppy programming.
Besides little-endian and big-endian, other conventions have existed, often called mixed-endian. These have mostly disappeared. However, big-endian and little-endian architectures are still thriving, therefore, for portability alone, code should be endian neutral.
- 320,799
- 57
- 780
- 949
AES works on byte sequences. The first thing you should do is convert the bytes to the canonical byte order, then you can do the decryption. Cryptography should be done in the application, with the application data. In other words, you should correct for byte order at a lower layer of abstraction than the crypto.
- 98,420
- 30
- 267
- 572