Shortest example of code that is compiled and increasing the optimization level makes it take longer



Just curious if someone can provide a short example of code that causes a compiler's optimizer to behave incorrectly:

  • without optimization (or at lower optimization), code takes X seconds to run
  • with optimization (or at higher optimization), code takes Y seconds to run where Y > X

In other words, the optimizer's cost model causes it to make a poor decision with a specific case, even though (presumably) it makes good decisions in general.

This is meant for C (with gcc the comparisons should be between -O0, -O1, -O2, -O3 in increasing order of optimization), but any compiled language should work.

Jason S

Posted 2017-03-23T21:51:25.000

Reputation: 187


Welcome to PPCG! This site is for specific code challenges, where multiple answers can exist for the community to see. From what I can see, this could be a potential code challenge; however, all code-challenges need some sort of scoring system to objectively determine a winner. If you're wondering whether or not this is possible, Stack Overflow might be a more suitable place to look.

– HyperNeutrino – 2017-03-23T21:54:52.890

This challenge has a creative premise, but I feel like issues such as winning criteria could have been worked out better if it was posted in the sandbox. Regardless, thanks for posting and welcome!

– Zwei – 2017-03-23T22:06:14.043

Good idea, but, as I learned when I joined PPCG, try posting it in the sandbox first. – ckjbgames – 2017-03-23T22:14:19.300

This seems machine specific. For example, GCC's -ftree-vectorize (on at -O3) automatically vectorizes loops using SIMD instruxtions. On most computers, this is a win, but, especially on older Intel CPUs, their SIMD units were quite slow. – Robert Fraser – 2017-03-24T06:49:15.123

Also you might need to cross compile to get it to show the poor behavior. Compile it on a modern processor then put it on a machine with a first generation SSEx unit for whatever SSE has the relevant instruction. – Robert Fraser – 2017-03-24T07:05:48.210

2I've got an example where gcc produces slightly better code when using -Os (which is basically O2 + trying to save space) than when using -O3. Does that count? – Christoph – 2017-03-24T07:14:18.153

@Christoph yes please that would be interesting :-) – Jason S – 2017-03-24T20:46:37.200

1Could someone help me come up with a simple objective scoring system? (e.g. shortest code where the version with increased optimization is at least 20% slower) – Jason S – 2017-03-24T20:48:24.220



C (gcc 6.2.0-5ubuntu12 x86_64), 58 bytes

typedef struct{long a,b}s;s f(long*c){return(s){*c,c[1]};}

Compiled with -O1, we get the following assembly code (minus boilerplate):

    movq    8(%rdi), %rdx
    movq    (%rdi), %rax

With -O3, it looks like this:

    movdqu  (%rdi), %xmm0
    movaps  %xmm0, -24(%rsp)
    movq    -24(%rsp), %rax
    movq    -16(%rsp), %rdx

This is obviously worse (the first version copies from memory to registers directly, the second version copies from memory into the wrong register, back into scratch space in memory, and then from there into the correct registers, which is just doing more work for no gain).

This is a pretty longstanding example of a silly gcc miscompilation. In older versions of gcc, the -O3 code was even worse (containing a full function setup/teardown sequence, stack protection checks, and the like); it seems to have improved somewhat in the more recent versions.

Incidentally, clang produces code almost identical to gcc's -O1 at any positive optimization level (the only difference is that sometimes the two assignments are done in the other order).


Posted 2017-03-23T21:51:25.000