Computationally Calculate Core Count

7

Almost all operating systems allow you to run multiple programs at the same time. If your system does not have enough processors to run all the programs you want to run, it starts to give each program just a tiny slice of time on the processor it runs on. This is normally not noticeable but can lead to strange artefacts that you might not expect.

Your task is to write a program that tries to find out how many cores your computer has without asking the operating system. This is a popularity contest, the most upvoted solution wins. Here are the constraints:

  • While you cannot expect that your submission is the only thing that runs, you may expect that all other programs on the system are idling and wake up only sporadically.
  • You may not ask the operating system how many cores your computer has. Neither may you look up this number from any configuration file or similar. You may also not invoke a utility that provides you with that information.
  • For the sake of this challenge, the number of cores your computer has is the number of processes your computer can run at the same time. For instance, if you have an Intel processor with 2 cores and hyper threading, that would count as 4 cores.
  • Your solution needs not to be precise. It should not be terribly wrong all the time though.

FUZxxl

Posted 2015-08-23T14:39:09.000

Reputation: 9 656

8int num_cores() { return 1; } - only works on certain devices. – orlp – 2015-08-23T14:46:13.697

2def num_cores(): return random.choice([1,2,4,8]) - works with about 25% accuracy for most consumer devices. "Your solution needs not to be precise. It should not be terribly wrong all the time though." – FryAmTheEggman – 2015-08-23T15:16:39.820

What if a device uses hyperthreading (we have then a device that pretends to have more cores than it physically has)? Should we try to report the number of real physical cores, or the one reported by the system? – pawel.boczarski – 2015-08-23T18:24:15.263

1@pawel.boczarski That's defined in the 3rd bullet item of the question. – Reto Koradi – 2015-08-23T18:40:06.837

What does "find out" mean? Do I need to print the number somewhere or is it sufficient if the program has the information internally only? My expected solution might not be able to print anything any more after finding out the number of CPUs... – Thomas Weller – 2015-08-23T22:51:07.907

Does the program need to terminate? – Thomas Weller – 2015-08-23T22:51:57.157

@FryAmTheEggman: There are AMD hexacore systems in which case it'll always be wrong – Thomas Weller – 2015-08-23T22:53:21.440

Most virtualization software will allow you to control how many virtual cores are assigned to a given VM. This allows you to easily test against different configs, probably with reasonable results in many cases. For example I have an Ubuntu VM running in VMWare Fusion on OSX. – Digital Trauma – 2015-08-24T02:05:18.347

@ThomasWeller This is a popularity contest. What attracts to most votes wins. You should try to print that number out though. – FUZxxl – 2015-08-24T06:41:45.320

Answers

10

C++11

Simply compares single core performance with threaded performance until there is not a linear speedup anymore. It's important that you do not have any other CPU intensive process running at the same time.

Compile with -O2 -m64 -march=native -std=c++11. You might have to pass -lpthread.

#include <chrono>
#include <cstdint>
#include <future>
#include <iostream>
#include <vector>

double f() {
    double cnt = 0;
    std::chrono::time_point<std::chrono::high_resolution_clock> start, end;
    start = std::chrono::system_clock::now();

    while (true) {
        end = std::chrono::system_clock::now();
        std::chrono::duration<double> elapsed = end - start;
        if (elapsed.count() > 1.) break;
        cnt++;
    }

    return cnt;
}

int main(int argc, char** argv) {
    double one_core = f();

    int n = 1;
    while (true) {
        std::vector<std::future<double>> futures;
        for (int i = 0; i < n; ++i) {
            futures.push_back(std::async(std::launch::async, f));
        }

        double tot = 0;
        for (int i = 0; i < n; ++i) tot += futures[i].get();
        if (int(tot / one_core + 0.5) != n) {
            std::cout << n - 1 << " core(s)\n";
            return 0;
        }

        ++n;
    }
}

orlp

Posted 2015-08-23T14:39:09.000

Reputation: 37 067

Succesfully tested on Intel Haswell, Raspberry Pi (both quadcore) and some laptop Intel (dual core). – orlp – 2015-08-23T22:47:28.170

Yes, I was going to do something like this in [tag:c], but you save me the effort ;-). Have a +1. – Digital Trauma – 2015-08-24T02:02:22.147

7

C

Here's a new, imprecise approach:

/* ncpu.c */

#include <stdio.h>
#include <pthread.h>
#include <signal.h>

pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
int count = 1;  /* start at 1 for the main() thread */

static void *inc_thread (void *arg) {
    /* Woohoo - now we're no longer languishing on the ready queue -
     * We're actually running on a real core.  Atomically bump the counter. */
    pthread_mutex_lock(&count_mutex); {
        count++;
    } pthread_mutex_unlock(&count_mutex);

    /* keep this core spinning so another thread doesn't get it */
    while (1);
    return NULL;
}

#define MAX_CORE 64
int main (int argc, char **argv) {
    pthread_t tid_array[MAX_CORE];
    int i;

    /* create a bunch of threads */
    for (i = 0; i < MAX_CORE; i++) {
        pthread_create(&tid_array[i], NULL, inc_thread, NULL);
    }
    /* cancel all those threads, regardless of whether they ran or not */
    for (i = 0; i < MAX_CORE; i++) {
        pthread_cancel(tid_array[i]);
    }

    printf("Your core count is approximately %d\n", count);
}

Build with LDFLAGS=-pthread make ncpu.

This works some of the time on Ubuntu 14.04. I have tested with 1, 2 and 4 virtual cores in a VM.

This kicks off 64 threads, and then (almost) immediately cancels all those threads. pthread_create() will put the new thread in a ready queue, but won't necessarily jump into the thread start routine immediately. When the threads get to run depends on core availability at that time.

For example if there are 4 cores, then 3 of the threads can potentially start running immediately. Each thread atomically increments a counter then spins for a bit until it is cancelled. Beyond the third thread, the rest of the threads will still be sitting in the ready queue when the main thread cancels them, and so they won't get to increment the counter.

This is certainly not very precise, but its only terribly wrong some of the time. Here are some runs with 1, 2 and 4 virtual cores:

1 core:

$ for i in {1..10}; do ./ncpu; done
Your core count is approximately 65
Your core count is approximately 1
Your core count is approximately 1
Your core count is approximately 15
Your core count is approximately 1
Your core count is approximately 19
Your core count is approximately 1
Your core count is approximately 1
Your core count is approximately 42
Your core count is approximately 1
$

2 cores:

$ for i in {1..10}; do ./ncpu; done
Your core count is approximately 2
Your core count is approximately 2
Your core count is approximately 2
Your core count is approximately 2
Your core count is approximately 2
Your core count is approximately 2
Your core count is approximately 3
Your core count is approximately 3
Your core count is approximately 2
Your core count is approximately 2
$ 

4 cores:

$ for i in {1..10}; do ./ncpu; done
Your core count is approximately 4
Your core count is approximately 5
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
Your core count is approximately 4
$ 

My previous answer:

Here's a start, though it only distinguishes between exactly one core and more than one core:

/* 1cpu.c */

#include <stdio.h>
#include <pthread.h>

int c = 0;

static void *inc_thread (void *arg) {
    int i;
    for (i = 0; i < 10000000; i++) c++;
    return NULL;
}

static void *dec_thread (void *arg) {
    int i;
    for (i = 0; i < 10000000; i++) c--;
    return NULL;
}

int main (int argc, char **argv) {
    pthread_t inc_tid, dec_tid;

    pthread_create(&inc_tid, NULL, inc_thread, NULL);
    pthread_create(&dec_tid, NULL, dec_thread, NULL);

    pthread_join(inc_tid, NULL);
    pthread_join(dec_tid, NULL);

    if (c) {
        printf("c=%d, More than one CPU\n", c);
    } else {
        printf("c=%d, Only one CPU\n", c);
    }
}

Build with LDFLAGS=-pthread make 1cpu.

This is a nice demonstration of how normal ++ and -- are not atomic operations. In the single-core situation, inc_thread() and dec_thread() can never run concurrently, so the global counter c will always be incremented and decremented exactly 10 million times and thus end up as 0. However with more than one core, these two threads will likely end up on different cores, and thus colliding with their operations on the global counter. As we know, these operations have to load, [in|de]crement, then save each time, so threads can end up operating on stale data as they go through these operations. Therefore over 10 million runs we will likely end up with a different number of effective increment vs decrement operations and the counter will end up non-zero.

This method seems to be fairly accurate, and works on OSX as well as Linux.

Digital Trauma

Posted 2015-08-23T14:39:09.000

Reputation: 64 644

0

Python 2

import decimal
from timeit import Timer

elapsed = decimal.Decimal(Timer("""1 and 1""").timeit())
speed = 1/elapsed
cores = speed/decimal.Decimal(3.6)

cores = round(cores)

print cores

I kind of lost what I was doing halfway through but this works fairly well for me.

Edit

I guess modern computer cores have a greater speed. The program should now be more accurate (hopefully).

Beta Decay

Posted 2015-08-23T14:39:09.000

Reputation: 21 478

3This outputs numbers in the range 20-22 for me... I have 4 cores. – Doorknob – 2015-08-23T15:23:01.197

Outputs 9-13 for me, 4 cores as well. – isaacg – 2015-08-23T15:31:00.840

@Doorknob Strange... It outputs 2-4 for me with 4 cores.. – Beta Decay – 2015-08-23T16:00:40.107

It's not said, that the output of the core count has to be precise on every system! Also, the elapsed time of a statement depends on the core speed, so it'll only equal with equal cores. – bobbel – 2015-08-23T16:23:00.307