The goal is to write a program that returns number of 1 bits in a given number.

Examples

5      ->  2
1254   ->  6
56465  ->  8

Winner

The winning submission is the code which runs in the minimum time. You can assume that normal int value in your language can handle any number (no matter how large) and computes all the valid operators defined for integer in O(1).

If two pieces of code run in the same time the winner is the one implemented in fewer bytes.

Please provide some explanation how your code is computing the result.

Ali1S232

Posted 2012-01-09T23:03:00.643

Reputation: 417

Duplicate: Print the amount of ones in a binary number without using bitwise operators

– Gareth – 2012-01-09T23:14:30.343

1@gareth: that's not a duplicate, in that question binary operators were not allowed, while in mine you can use them. beside in that question he assumed numbers are small while in mine I assume int is large enough! – Ali1S232 – 2012-01-09T23:36:21.447

Runtime is not a meaningful selection criteria as this is highly dependent on the language used. Are you expecting answers in assembler? – gnibbler – 2012-01-09T23:38:17.997

I usually write my own codes in c++, it doesn't really matter which language you are using as while as I can implement the same algorithm in c++. – Ali1S232 – 2012-01-09T23:47:17.263

@Gajet then shouldn't you mention that in the question & add a c++ tag? – elssar – 2012-01-10T07:19:16.860

@elssar as I said it's not important that your code is written in c++. I can almost convert any code to c++, since CPU can run any code in any language! as I said I just need explanation how some algorithms work. – Ali1S232 – 2012-01-10T09:19:34.450

Answers

C/C++

int popcnt(int n)
{
    return __builtin_popcount(n);
}

For some CPUs __builtin_popcount compiles to a single instruction (e.g. POPCNT on x86).

Note: requires gcc or gcc-compatible compiler (e.g. ICC).

Paul R

Posted 2012-01-09T23:03:00.643

Reputation: 2 893

Using an 8-bit lookup table in the python shell

>>> tab=[0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8]
>>> N=123456789
>>> tab[N>>24]+tab[N>>16&255]+tab[N>>8&255]+tab[N&255]
16

gnibbler

Posted 2012-01-09T23:03:00.643

Reputation: 14 170

I guess in c++ I can use pointers instead of shifting which makes my code run a bit faster. – Ali1S232 – 2012-01-10T10:13:46.077

Haskell, 9 (+16)

There was a popCount routine added to the Data.Bits module for GHC v7.2.1/v7.4.1 this summer (see tickets concerning the primop and binding).

import Data.Bits
popCount

There is a good arbitrary precision popcount function in the GNU Multiple Precision Arithmetic Library (GMP), which GHC uses by default, but most other serious languages offer bindings to :

Python, 16 (+12)

import gmpy;
gmpy.popcount(...);

Perl, 11 (+22)

use GMP::Mpz qw(:all);
popcount(...);

Jeff Burdges

Posted 2012-01-09T23:03:00.643

Reputation: 445

fastest for 32 bit numbers:

int bitcount(int i){
    i=i&0x55555555 + (i>>1)&0x55555555;
    i=i&0x33333333 + (i>>2)&0x33333333;
    i=i&0x0f0f0f0f + (i>>4)&0x0f0f0f0f;
    i=i&0x00ff00ff + (i>>8)&0x00ff00ff;
    return i&0x0000ffff + (i>>16)&0x0000ffff;
}

for 64 bits:

int bitcount(long i){
    i=i&0x5555555555555555 + (i>>1)&0x5555555555555555;
    i=i&0x3333333333333333 + (i>>2)&0x3333333333333333;
    i=i&0x0f0f0f0f0f0f0f0f + (i>>4)&0x0f0f0f0f0f0f0f0f;
    i=i&0x00ff00ff00ff00ff + (i>>8)&0x00ff00ff00ff00ff;
    i=i&0x0000ffff0000ffff + (i>>16)&0x0000ffff0000ffff;
    return i&0x00000000ffffffff + (i>>32)&0x00000000ffffffff;
}

these are all for C-based languages that define the binary operators (C, C++, C#, D, java, ...) (barring the int long type, translate as needed)

for arbitrary bit sizes:

int bitcount(BigInt i){
    int c=0;
    BigInt _2_pow_64 = BigInt(2).pow(64);
    while(i>0){
       long rem = i.mod(_2_pow_64);//or AND with 0xffffffffffffffff
       c+=bitcount(rem);//as defined above
       i=i.div(_2_pow_64);//or rshift with 64
    }
    return c;
}

BigInt being the class for integers of arbitrary size, and the .div, .pow, .mod being the 'divide by', 'to the power of' and 'remainder of' functions reps.

ratchet freak

Posted 2012-01-09T23:03:00.643

Reputation: 1 334

though I'm not big int part (caused by all div and mod parts), it sure works fast enough for int and long . – Ali1S232 – 2012-01-10T00:05:52.007

@Gajet you can use shift and mask (its dividing and mod by a power of 2) I edited to clarify – ratchet freak – 2012-01-10T00:20:55.820

I think a lookup table will be faster. For an 8 bit lookup table, you only need 4 lookups and 3 additions to do 32 bit. 16 bit lookup table will be slightly faster but much bigger. – gnibbler – 2012-01-10T01:12:34.177

@gnibbler then you get caching issues – ratchet freak – 2012-01-10T02:49:36.513

Is a 256 byte table really too large for the cache? – gnibbler – 2012-01-10T03:02:20.923

@gnibbler cache lines can be as small as 4 words – ratchet freak – 2012-01-10T03:05:26.097

C, 33 bytes

I just saw the challenge today, but thought of a recursive solution.

B(n){return (n&1)+(n?B(n/2):0);};

Edited to note that I assumed it was a smallest size challenge. The speed is linear to the number of bits required to contain the number. O(N) = k * Log2(N).

user230118

Posted 2012-01-09T23:03:00.643

Reputation: 239

30 bytes – ceilingcat – 2020-01-23T00:09:11.673

Here is rachet freak's submission (the 32-bit version) converted to Scala (thanks RF!):

def num1Bits(x: Int) = {
  var i = x
  i = (i & 0x55555555) + ((i>> 1) & 0x55555555)
  i = (i & 0x33333333) + ((i>> 2) & 0x33333333)
  i = (i & 0x0f0f0f0f) + ((i>> 4) & 0x0f0f0f0f)
  i = (i & 0x00ff00ff) + ((i>> 8) & 0x00ff00ff)
      (i & 0x0000ffff) + ((i>>16) & 0x0000ffff)
}

AmigoNico

Posted 2012-01-09T23:03:00.643

Reputation: 111

Three years too late, but welcome to the site! – caird coinheringaahing – 2017-12-14T00:00:47.353

Erlang

p(N)->p(N,0).
p(0,S)->S;p(N,S)->p(N div 2, S+N rem 2).

The p function divides the number with 2, adds the remainder to S, and calls itself with N/2 (tail) recursively.

While it takes log2(N) steps (division and function call) to calculate the result, it works with arbitrary large numbers.

pgy

Posted 2012-01-09T23:03:00.643

Reputation: 830

scala:

@tailrec
def bits (i: Long, sofar: Int): Int = if (i==0) sofar else bits (i >> 1, (1 & i).toInt + sofar)

computes on a 2Ghz single core for about 1.500.000 64bit Long values the number of bits set. A straight forward recursive method, comparing the last bit, and shifting the number towards zero. The @tailrec is just a hint to the compiler to warn me, if it can't optimize the tail recursive call - it is not necessary, for the optimization to take place.

It's about 10times faster than the simplest method, using the library function:

def f(n:Long)=n.toBinaryString.count(_=='1')

Translating them to c++ might not lead to similar results.

user unknown

Posted 2012-01-09T23:03:00.643

Reputation: 4 210

the other answers are much more faster, for example ratchet only computes 18 operations for 64bit integer, while you code need at least 128 operations for same size int. – Ali1S232 – 2012-01-10T09:23:00.310

Yes, I see - 10 M longs/s with bitcount as Javacode on the same machine. Btw.: I count 24 operations for ratchet, not 18. – user unknown – 2012-01-10T10:06:46.323

my bad! by the way java use the same algorithm as gnibbler suggested – Ali1S232 – 2012-01-10T10:09:21.903

compute number of 1 bits in a binary representation of a number

Examples

Winner

Answers

C/C++

Haskell, 9 (+16)

Python, 16 (+12)

Perl, 11 (+22)

C, 33 bytes

Erlang

scala: