6
While looking at the ARM instruction set, you notice that the ADD
instruction has the so-called "Flexible second operand", which can be abused for quick multiplication. For example, the following instruction multiplies register r1
by 17 (shifting it left by 4 bits and adding to itself):
ADD r1, r1, r1, LSL #4
; LSL means "logical shift left"
The subtraction instruction has that feature too. So multiplication by 15 is also easy to implement:
RSB r1, r1, r1, LSL #4
; RSB means "reverse subtract", i.e. subtract r1 from (r1 << 4)
So this is an excellent optimization opportunity! Certainly, when you write in a C program x *= 15
, the compiler will translate that to some inefficient code, while you could replace it by x = (x << 4) - x
, and then the compiler will generate better code!
Quickly, you come up with the following cunning plan:
Write a program or a subroutine, in any language, that receives one parameter
m
, and outputs a C optimizing macro, like the following (form = 15
):#define XYZZY15(x) (((x) << 4) - (x))
or (which is the same)
#define XYZZY15(x) (-(x) + ((x) << 4))
???
- Profit!
Notes:
- The macro's name is essential: magic requires it.
- Parentheses around
x
and the whole expression are required: ancient C arcana cannot be taken lightly. - You can assume
m
is between 2 and 65535. - The input and output can be passed through function parameters,
stdin/stdout
or in any other customary way. - In C, addition and subtraction have tighter precedence than bit shift
<<
; use parentheses to clarify order of calculations. More parentheses than necessary is OK. - It is absolutely vital that the C expression be optimal, so e.g.
x+x+x
is not acceptable, because(x<<1)+x
is better. - Expressions
x + x
andx << 1
are equivalent, because both require one ARM instruction to calculate. Also,(x << 1) + (x << 2)
(multiplication by 6) requires two instructions, and cannot be improved. - You are amazed to discover that the compiler detects that
(x << 0)
is the same asx
, sox + (x << 0)
is an acceptable expression. However, there is no chance the compiler does any other optimizations on your expression. Even though multiplication by 17 can be implemented in one instruction, multiplication by 17*17 cannot be implemented in two instructions, because there is no C expression to reflect that:
#define XYZZY289(x) x + (x << 4) + (x + (x << 4) << 4) // 3 operations; optimal
In other words, the compiler doesn't do common sub-expression elimination. Maybe you will fix that bug in version 2...
This is code-golf: the shortest code wins (but it must be correct, i.e. produce correct and optimal expressions for all m
)
Please also give example outputs for m = 2, 10, 100, 14043 and 65535
(there is no requirement on the length of the expression - it can contain extra parentheses, whitespace and shift left by 0 bits, if that makes your code simpler).
For some
2 <= m <= 65535
, there will be a worst case value where the number of bitshifts is a maximum. Is it guaranteed that even for this worst case that it will be more efficient to do this many shifts instead of one multiplication? – Digital Trauma – 2015-03-23T21:00:10.8031Define “optimal.” I count two operations in both
x+x+x
and(x<<1)+x
. – FUZxxl – 2015-03-23T21:12:44.6031@FUZxxl: ARM can do as a single instruction x+(y<<k), x-(y<<k), and (y<<k)-x (for registers x,y and any constant k). So there is only one operation in your second example. – Keith Randall – 2015-03-23T21:53:01.983
1It's not currently clear to me whether output which uses sub-macros is required or not. Please add a worked example for something like
m = 24
where that would make a difference. – Peter Taylor – 2015-03-23T22:25:14.2201Does the compiler do common subexpression elimination? If I have
((x<<8)+1)+(((x<<8)+1)<<5)
, is that 3 ops, or 2? – Keith Randall – 2015-03-24T03:28:19.377@PeterTaylor, I think 24 is just
(x+(x<<1))<<3
– gnibbler – 2015-03-24T04:36:52.083@KeithRandall, that's not a multiplication. Did you mean
((x<<8)+x)+(((x<<8)+x)<<5)
? That can be done in 2 stepsADD r1, r1, r1, LSL #8
andADD r1, r1, r1, LSL #5
– gnibbler – 2015-03-24T04:46:11.500@DigitalTrauma This is just a code-golf challenge; the explanation on performance optimization is just for fun. Compilers do this type of optimization since long ago (but don't tell anyone - it's secret!). Suppose the text starts like You tear a leaf from the calendar - today is 8.9.89... – anatolyg – 2015-03-24T07:12:17.797
@PeterTaylor Sub-macros are not required and not allowed. I added an example for 289. – anatolyg – 2015-03-24T07:13:35.960
@gnibbler, yes, my example should have been 27. Essentially it's the same question as Keith's. – Peter Taylor – 2015-03-24T09:17:47.600
I don't think it's right to not optimise 17*17 etc. Why can't you write it in C as
x = (x << 4) + x; x = (x << 4) + x
– gnibbler – 2015-03-25T01:51:37.343Can you provide a sample for m = 11? As I understand the question, there is no possible XYZZY11 macro. I assume I'm missing something. – Matt Noonan – 2015-03-25T03:18:32.107
@gnibbler I guess your suggestion will require some sort of BFS. The rules as stated allow both BFS and bit-fiddling - more flexibility in algorithm choice is better. I agree that utilizing common sub-expressions is "more optimized", but let's leave that for version 2.0 (if at all). – anatolyg – 2015-03-25T08:13:46.543
@MattNoonan
#define XYZZY11(x) x + (x<<1) + (x<<3)
or#define XYZZY11(x) -x + (x<<2) + (x<<3)
or#define XYZZY11(x) -x + ((x + (x<<1)) << 2)
(but one should add the required additional parentheses) – anatolyg – 2015-03-25T08:15:57.587@anatolyg Ah, so we can write expressions which would utilize more than one register? Great! – Matt Noonan – 2015-03-25T12:29:53.270