x86-64 machine code, 12 bytes for int64_t input
6 bytes for double input
Requires the popcnt ISA extension (CPUID.01H:ECX.POPCNT [Bit 23] = 1).
(Or 13 bytes if modifying the arg in-place requires writing all 64-bits, instead of leaving garbage in the upper 32. I think it's reasonable to argue that the caller would probably only want to load the low 32b anyway, and x86 zero-extends from 32 to 64 implicitly with every 32-bit operation.
Still, it does stop the caller from doing add rbx, [rdi] or something.)
x87 instructions are shorter than the more obvious SSE2 cvtsi2sd/movq (used in @ceilingcat's answer), and a [reg] addressing mode is the same size as a reg: just a mod/rm byte.
The trick was to come up with a way to have the value passed in memory, without needing too many bytes for addressing modes. (e.g. passing on the stack isn't that great.) Fortunately, the rules allow read/write args, or separate output args, so I can just get the caller to pass me a pointer to memory I'm allowed to write.
Callable from C with the signature: void popc_double(int64_t *in_out); Only the low 32b of the result is valid, which is maybe weird for C but natural for asm. (Fixing this requires a REX prefix on the final store (mov [rdi], rax), so one more byte.) On Windows, change rdi to rdx, since Windows doesn't use the x86-64 System V ABI.
NASM listing. The TIO link has the source code without the disassembly.
1 addr machine global popcnt_double_outarg
2 code popcnt_double_outarg:
3 ;; normal x86-64 ABI, or x32: void pcd(int64_t *in_out)
4 00000000 DF2F fild qword [rdi] ; int64_t -> st0
5 00000002 DD1F fstp qword [rdi] ; store binary64, using retval as scratch space.
6 00000004 F3480FB807 popcnt rax, [rdi]
7 00000009 8907 mov [rdi], eax ; update only the low 32b of the in/out arg
8 0000000B C3 ret
# ends at 0x0C = 12 bytes
Try it online! Includes a _start test program that passes it a value and exits with exit status = popcnt return value. (Open the "debug" tab to see it.)
Passing separate input/output pointers would also work (rdi and rsi in the x86-64 SystemV ABI), but then we can't reasonably destroy the 64-bit input or as easily justify needing a 64-bit output buffer while only writing the low 32b.
If we do want to argue that we can take a pointer to the input integer and destroy it, while returning output in rax, then simply omit the mov [rdi], eax from popcnt_double_outarg, bringing it down to 10 bytes.
Alternative without silly calling-convention tricks, 14 bytes
use the stack as scratch space, with push to get it there. Use push/pop to copy registers in 2 bytes instead of 3 for mov rdi, rsp. ([rsp] always needs a SIB byte, so it's worth spending 2 bytes to copy rsp before three instructions that use it.)
Call from C with this signature: int popcnt_double_push(int64_t);
11 global popcnt_double_push
12 popcnt_double_push:
13 00000040 57 push rdi ; put the input arg on the stack (still in binary integer format)
14 00000041 54 push rsp ; pushes the old value (rsp updates after the store).
15 00000042 5A pop rdx ; mov rdx, rsp
16 00000043 DF2A fild qword [rdx]
17 00000045 DD1A fstp qword [rdx]
18 00000047 F3480FB802 popcnt rax, [rdx]
19 0000004C 5F pop rdi ; rebalance the stack
20 0000004D C3 ret
next byte is 0x4E, so size = 14 bytes.
Accepting input in double format
The question just says it's an integer in a certain range, not that it has to be in a base2 binary integer representation. Accepting double input means there's no point in using x87 anymore. (Unless you use a custom calling convention where doubles are passed in x87 registers. Then store to the red-zone below the stack, and popcnt from there.)
11 bytes:
57 00000110 66480F7EC0 movq rax, xmm0
58 00000115 F3480FB8C0 popcnt rax, rax
59 0000011A C3 ret
But we can use the same pass-by-reference trick as before to make a 6-byte version: int pcd(const double&d);
58 00000110 F3480FB807 popcnt rax, [rdi]
59 00000115 C3 ret
6 bytes.
1
Are you intending that functions can accept their inputs already in floating-point
– Peter Cordes – 2017-07-29T15:26:21.930binary64format if they want? Some people (including myself, initially) were interpreting the question as requiring that functions accept inputs as an integer type like C'slong. In C, you can argue that the language will convert for you, just like when you callsqrt((int)foo). But there are some x86 machine-code asm answers (like https://codegolf.stackexchange.com/a/136360/30206 and mine) which were both assuming we had to accept 64-bit integer inputs. Accepting abinary64value would save 5 bytes.If so, then all that stuff about limited-range is just in case someone wanted to hack up the conversion to a binary64 bit-pattern themselves instead of type-punning? Or for languages without type-punning? Hmm, an interesting challenge might be to add the exponent and mantissa of a
binary64as base2 integers. If you need to handle them separately anyway, it might be worth doing something other than type-pun and loop over all the bits. – Peter Cordes – 2017-07-29T15:28:21.0802@PeterCordes Yes, you can input in the form of a a floating-point number. The limited range is to make sure that the floating-point representation is accurate – Luis Mendo – 2017-07-29T15:30:49.860
Ok, thanks. I guess you wanted to leave the option of writing a function that takes a
long, so you couldn't just say any binary64double, because not all doubles are integers. But all integer-valueddoubles can be converted tolongand back, up to the limits oflong. (As you point out, the reverse isn't true. You get the nearest representabledouble, assuming default rounding mode). Anyway, this was a totally valid way to set up the question; I just didn't read it carefully >.< – Peter Cordes – 2017-07-29T15:53:14.237"Note that endianness doesn't affect the result, so you can safely use your machine's actual internal representation of double-precision values to compute the output." unless your machine doesn't use IEEE floating point format... – Jerry Jeremiah – 2017-08-30T22:41:27.470
@PeterCordes In C++ type punning is undefined behaviour (although I think gcc usually does the Right Thing) – Jerry Jeremiah – 2017-08-30T22:43:04.450
@JerryJeremiah: Type-punning with a union is defined in C99, and in GNU C++ as a GNU extension, so g++, clang++, and ICC all support it. Good point about endianness; it's possible for a machine to have different float endian than integer endian, although AFAIK no modern machines are like that. https://stackoverflow.com/q/2945174/224132. C allows for that. But code-golf answers only have to work on at least one implementation, not every implementation, so working on x86/gcc where
– Peter Cordes – 2017-08-30T22:50:22.320doubleis IEEE binary64 is fine :PType-punning with a pointer-cast happens to work in un-optimized gcc, and sometimes works in optimized code (but is a terrible idea). It always works with
gcc -O3 -fno-strict-aliasing. (@JerryJeremiah, I assume you were talking about type-punning the way these answers are doing it). – Peter Cordes – 2017-08-30T22:55:50.317