In Return-Oriented Programming how can the machine execute unaligned instructions?

Question

I am reading "The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)".

The author claims that x86 code is like English written without punctuation or spaces, so that the words all run together. This means that execution can start on a different location than intended by the programmer and that there exists a sufficient set of code sequences of this type within libc to allow arbitrary computation.

From the paper the following sentence has confused me:

"On an architecture such as MIPS where all instructions are 32 bits long and 32-bit aligned there is no ambiguity about where to start or strop, and no unintended instructions of the sort that we describe."

On a 32-bit machine aren't x86 instructions also 32-bit aligned?

Does this mean that it is possible for non-binary alignment and execution? (e.g. I have bits "00001111|22223333" in memory on a 32-bit machine and I decide to execute the instruction "11112222"?

I have to ask. What gave you the idea that x86 instructions are aligned to 4 bytes? Do you think x86-64 instructions are aligned to 8 bytes? Have you ever run `objdump -d` (or `otool -d` on a Mac)? A "32bit architecture" means the general purpose registers are 32 bit wide. The width of the bus, the width of the ALU, the size of addressable memory can be different from the width of the architectural registers. 64bit ARMv8 chips like in the iPhone very often use 16bit Thumb2 instruction encoding. — Z.T., May 27 '19 at 21:11
That memory is addressed by processor word size. I have studied RISC-based computer in a COA class and thought that instructions are read sequentially and that instructions memory are read from memory 4 bytes at a time. — Anthony O, May 27 '19 at 21:37
What I do not understand is if decoding can start from any offset. My assumption was that decoding of instructions always starts at an address divisible by 4, and that if the end of the instruction sequence was not divisible by 4 (e.g. 13) then padding was inserted to allow decoding to start at the next available address divisible by 4. — Anthony O, May 27 '19 at 21:57
Your course materials should have clearly explained that what you learned is a very simple architecture used mainly for teaching. MIPS is dead, POWER has a tiny market share, RISC-V might compete with Cortex M0. The two most successful architectures are x86 and ARM (the 32bit and now the 64bit). They converged on a similar design, starting from opposite starting positions, while being encumbered by different backwards compatibility issues. Almost no real hardware works the way you describe. When reading about any real hardware, do no assume it is anything like MIPS (or "NAND to Tetris"). — Z.T., May 27 '19 at 22:10
The popular modern chips are like this: https://en.wikichip.org/wiki/arm_holdings/microarchitectures/cortex-a76#Overview https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Pipeline https://mdsattacks.com/diagram.html and they have weird instructions like https://arxiv.org/pdf/1803.06185.pdf While the loop decoder cache optimizations inside Intel chips are affected by alignment to 16 and 64 bytes, code density is important (which is why thumb2) so in effect variable length instructions are winning. — Z.T., May 27 '19 at 22:36
Thank you. To clarify for future readers and myself I want to confirm that I am wrong then, and that the smallest addressable unit of memory on an x86 machine is a byte and not the processor's word size (e.g. 32-bit or 64-bit) — Anthony O, May 27 '19 at 23:27
Now you're talking about data, not code. x86 does 1, 2, 4, 8, 16, 32 and 64 byte unaligned memory loads/stores. Some 16 and 32 byte operations do require alignment (and this is model/generation dependent). Unless you had an instructor that lied to your whole class, that's not a common misconception. `movzx eax, byte ptr [eax+ecx+1]` is very common in code. — Z.T., May 27 '19 at 23:39

Z.T. · Answer 1 · 2019-05-27T21:03:23.700

x86 is not RISC. The instructions are variable length. From 1 to 15 bytes. They are of course not aligned. Few things on x86 are aligned (atomics, some vector instructions, cache lines of course, etc.).

The text you're reading basically says that an instruction can't be a prefix to another instruction, but anything else goes.

So if we have 4 bytes in memory: AA BB CC DD

If the instruction pointer ("program counter") points to AA, it might execute the single 3 byte instruction AA BB CC, then start decoding from DD. But if the instruction pointer points to BB, maybe BB CC is a valid instruction too! Or maybe BB alone is a valid single byte instruction. Or maybe BB CC DD is a valid 3 byte instruction. Who knows? The possibilities are endless.

Since instructions can contain constants, if you can feed constants into a JIT (like firewall rules that get compiled into eBPF programs or lua code in a game or javascript code in a browser or whatever), you can make the JIT produce code with the bytes you want, and then you can jump into that code (basically, jump into the constant, make the CPU interpret it as instructions, profit). Some JITs take care to produce "safe" byte sequences instead of short/fast byte sequences for instructions bearing constants to ensure this doesn't happen. Same for ahead of time compilers.

If you're interested in mitigations against executing integer constants, look up "constant blinding". — user3553031, Jun 26 '19 at 23:26

score 1 · Answer 2 · answered May 27 '19 at 22:05

1

Instructions are not aligned on x86.

If instructions were aligned on x86, this would waste a lot of space since x86 instructions are all different lengths. Each one-byte instruction would waste another 3 bytes for alignment.

answered May 27 '19 at 22:05

user253751

3,885
3
19
15

Your answer should note that optimising compilers *do* align certain items on a boundary for efficiency. For example the start of functions, beginning of loops etc. This is obviously not required, but it can make code faster in certain circumstances. – David May 27 '19 at 22:15

In Return-Oriented Programming how can the machine execute unaligned instructions?

2 Answers2