With the recent Python bashing, here's an attempt to show Python's strengths. Your challenge is to write a program that calculates the factorial of as high a number n as possible within 10 seconds.

Your score will be (highest n for your program on your machine)/(highest n for my program on your machine)

Rules

You must calculate an exact integer solution. Since the factorial would be much higher than what can fit in a 64 bit unsigned integer, you can use strings if your language does not support large integers
Standard loopholes are forbidden. Particularly, you cannot use any external resources.
Only the calculation part(this includes time for any workarounds using strings) adds to the total time which should be under 10 seconds on average.
Single threaded programs only.
You must store the output in an easily printable form (as printing takes time) (see my program below), string, variable, character array, etc.

EDIT:

Your program must give the correct output for all n: 1 <= n <= (your highest n)

EDIT2:

I hate to say this explicitly but using your language's built-in factorial functions falls under the standard loopholes http://meta.codegolf.stackexchange.com/a/1078/8766 Sorry Mathematica and Sage

My program

from __future__ import print_function
import time


def factorial( n ):
    return reduce( ( lambda x , y : x * y ) , xrange( 1 , n + 1 ) , 1 )

start = time.clock()
answer = factorial( 90000 )
end = time.clock()

print ( answer )
print ( "Time:" , end - start , "sec" )

Highest score wins. For the record, my code can manage n = 90000 in about 9.89 seconds on a Pentium 4 3.0 GHz

EDIT: Can everyone please add the score rather than just the highest n. Just the highest n has no meaning by itself as it depends on your hardware. It's impossible to have an objective winning criterion otherwise. ali0sha's anwer does this correctly.

We have a winner. I didn't accept the java answer https://codegolf.stackexchange.com/a/26974/8766 as it kind of skirts close to http://meta.codegolf.stackexchange.com/a/1080/8766

user80551

Posted 2014-05-06T11:34:39.803

Reputation: 2 520

1You can use operator.mul instead of the lambda function – gnibbler – 2014-05-06T11:57:59.473

You should probably add that print([1, 2, 6, 10, 50, 120, ...][n]) is invalid. – Doorknob – 2014-05-06T12:09:26.463

1Bit suprised this works, but assuming I read the rules correctly this MATLAB solution would be nice: factorial(Inf), returns Inf in a fraction of a second. – Dennis Jaheruddin – 2014-05-06T12:40:35.777

Is my assumption that you are looking for an answer in [python] correct? – Justin – 2014-05-06T12:44:26.267

1@Doorknob That fits in the standard loopholes. – Justin – 2014-05-06T12:44:43.553

1@DennisJaheruddin, it's a bit of a stretch to refer to "Inf" as an "exact integer solution". – tobyink – 2014-05-06T13:12:27.833

How is bigint math a "strength" of Python? It doesn't even use GMP. – Niklas B. – 2014-05-06T13:35:47.197

1@Quincunx No, any language is allowed. – user80551 – 2014-05-06T17:22:51.027

@NiklasB. "It doesn't even use GMP."--I just used the simplest code so that others can make it more efficient – user80551 – 2014-05-06T17:37:50.833

@user80551 I was referring to your statement "here's an attempt to show Python's strengths", because this is clearly not a task where Python can shine (unless you use bindings to non-standard libraries). Still a valid challenge, of course :) – Niklas B. – 2014-05-06T17:41:52.553

@NiklasB. Hence the word attempt, Do you want me to change it to lame attempt or something? – user80551 – 2014-05-06T17:50:26.350

http://cs.stackexchange.com/questions/14456/factorial-algorithm-more-efficient-than-naive-multiplication – Guy Sirton – 2014-05-06T18:02:49.937

@NiklasB. C/C++ don't even have a bigint class at all. In fact python's bigints are quite fast compared to other bigints implementations (I believe F# is about 3 times slower from a comment I receive some day ago...) If you want to include languages that can load/import a bigint implementation, so can python. In summary: I don't see the point of your comment. – Bakuriu – 2014-05-06T19:11:33.427

@Bakuriu Sure, but Ruby, Haskell or other languages that use GMP by default will beat it hands down. Although I admit that my comment was lacking a proper point :) Nevermind – Niklas B. – 2014-05-06T19:12:32.033

1@NiklasB. Are you sure? Because on my machine Haskell is about tens of times slower at computing factorials... In fact I get a stackoverflow with the factorial of 1000000. Python's built-in math.factorial computes it without problems in about 12 seconds. – Bakuriu – 2014-05-06T19:28:47.813

1@Bakuriu Interesting, I can reproduce that. Seems like Python 3 is about 10 times faster than Python 2.7, nice. I will definitely shut my mouth now :) Not sure why Haskell is so slow actually – Niklas B. – 2014-05-06T20:48:35.370

Oh god you seriously implemented this with a reduce and a lambda? Really really inefficient even for python. – Claudiu – 2014-05-07T02:53:18.897

@Claudiu See my previous comment to Niklas B. , I just used the simplest method that came to my mind and didn't bother optimizing. Optimizing the code is part of the challenge. – user80551 – 2014-05-07T05:23:03.897

Could the down-voter please explain what's wrong with the question? – user80551 – 2014-05-07T09:04:01.550

How can someone measure the correct score if they're using a much more powerful CPU than you? – phuclv – 2014-06-15T10:17:42.587

I really wished that someone would provide a creative solution with bitshifts and lots of recursion. – Simon Kuang – 2014-07-28T00:07:50.557

Answers

C++ with GMP, score = 55.55 (10,000,000 / 180,000)

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <queue>
#include <gmpxx.h>

int main(int argc, char *argv[]) {
  uint64_t n = atoi(argv[1]);

  // Iterate through 1..n.  Strip off powers of 2.  Multiply
  // remainders together into <= 64 bit chunks.
  uint64_t twos = 0;
  std::vector<uint64_t> terms;
  uint64_t m = 1;
  for(uint64_t i = 1; i <= n; i++) {
    uint64_t j = __builtin_ctzll(i);
    twos += j;
    uint64_t k = i >> j;
    if(__builtin_clzll(m) + __builtin_clzll(k) >= 64) {
      m *= k;
    } else {
      terms.push_back(m);
      m = k;
    }
  }
  if(m != 1) terms.push_back(m);

  // convert to gmp
  // why isn't there a 64-bit constructor?
  std::queue<mpz_class> gmpterms;
  for(int i = 0; i < terms.size(); i++) {
    mpz_class x = (uint32_t)(terms[i] >> 32);
    x <<= 32;
    x += (uint32_t)terms[i];
    gmpterms.push(x);
  }

  // pop two from the bottom, multiply them, push on the end.
  while(gmpterms.size() > 1) {
    mpz_class a = gmpterms.front();
    gmpterms.pop();
    mpz_class b = gmpterms.front();
    gmpterms.pop();
    gmpterms.push(a * b);
  }

  mpz_class r = gmpterms.front();
  r <<= twos;
  //std::cout << r << std::endl;
}

Keith Randall

Posted 2014-05-06T11:34:39.803

Reputation: 19 865

Python 2.7

42.575 = ( 6,812,000 / 160,000 ) approx

Code:

import gmpy2

def fac1(n):
    m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
    L=map(gmpy2.mpz,xrange(1,n+1))
    Number = (len(L)-1).bit_length()
    while Number:Number-=1;L=m(L)
    return L[0]

def fac2(n):
    global E; E=0
    def f(i):
        global E; E+=i//2
        return[]if i==1 else f(i//2)+range(3,i,2)+[[1,i][i%2]]
    m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
    L=map(gmpy2.mpz,f(n))
    N=(len(L)-1).bit_length()
    while N: N-=1;L=m(L)
    return L[0]<<E

Test:

import time

start = time.time()
baseline(160000)
print time.time()-start

start = time.time()
fac1(6811000)
print time.time()-start

start = time.time()
fac2(6812000)
print time.time()-start

start = time.time()
gmpy2.fac(26000000)
print time.time()-start

Output:

10.0069999695
10.0729999542
10.0360000134
9.98699998856

How it works:

Bigger multiplications take more time, thus we want to do as many small multiplications as possible. This is especially true in Python where for numbers less that 2^64 we use hardware arithmetic, and above that we use software. So, in m(L), we start with a list L; if it's odd length we remove one number from consideration to make it even again. Then we multiply element 1 with element -2, element 3 with -4, etc, so that

m([1,2,3,4,5,6,7,8]) = [2*7, 4*5, 6*3, 8*1] = [14, 20, 18, 8]
m([10,12,6]) = [360,112]
m([120,6]) = [40320]

This approach ensures we're using hardware arithmetic for as long as possible, following which we switch onto the efficient gmc arithmetic library.

In fac2, we take a more classic divide and conquer approach as well, where we split out every multiple of 2 and bitshift them at the end for a trivial performance boost. I've included it here because it's usually around 0.5% faster than fac1.

Golfed Version of `fac1` (because I can), 220B

import gmpy2
def f(n):
    m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
    L=map(gmpy2.mpz,xrange(1,n+1));N=(len(L)-1).bit_length()
    while N:N-=1;L=m(L)
return L[0]

alexander-brett

Posted 2014-05-06T11:34:39.803

Reputation: 1 485

1If the GMP backend includes a bitshift function then you can keep the numbers even smaller by dividing each number in the list by 2 until it's even and then doing a single shift at the end. – Peter Taylor – 2014-05-06T15:58:52.453

Where did you get gmpy2 from? $ python Python 2.7.3 (default, Feb 27 2014, 19:58:35) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

from gmpy2 import mpz

Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named gmpy2

– user80551 – 2014-05-06T17:57:52.010

@user80551: https://code.google.com/p/gmpy/ (the top google search result) has installers for many different platforms.

– alexander-brett – 2014-05-06T19:35:38.603

For the golfed version, couldn't you do while len(L): ... instead of while len(L)>1: ...? – user80551 – 2014-05-07T09:10:59.840

No: the function inside that loop will never take the list below length 1 and anyhow we need the first element! – alexander-brett – 2014-05-07T09:12:57.247

Java - 125.15 (21,400,000 / 171,000)

Also shamelessly copied from Peter Luschny's Github repo (thanks @semi-extrinsic) and licensed under the MIT license, this uses the "prime factorization nested squaring" algorithm as proposed by Albert Schönhage et al. (according to Luschny's factorial algorithms description page).

I slightly adapted the algorithm to use Java's BigInteger and to not use a lookup table for n < 20.

Compiled with gcj, which uses GMP for its BigInteger implementation, and ran on Linux 3.12.4 (Gentoo), on a Core i7 4700MQ at 2.40GHz

import java.math.BigInteger;

public class PrimeSieveFactorialSchoenhage {

    private static int[] primeList, multiList;

    public static BigInteger factorial(int n) {
        int log2n = 31 - Integer.numberOfLeadingZeros(n);
        int piN = log2n < 2 ? 1 : 2 + (15 * n) / (8 * (log2n - 1));

        primeList = new int[piN];
        multiList = new int[piN];

        int len = primeFactors(n);
        return nestedSquare(len).shiftLeft(n - Integer.bitCount(n));
    }

    private static BigInteger nestedSquare(int len) {
        if (len == 0) {
            return BigInteger.ONE;
        }

        int i = 0, mult = multiList[0];

        while (mult > 1) {
            if ((mult & 1) == 1) { // is mult odd ?
                primeList[len++] = primeList[i];
            }

            multiList[i++] = mult / 2;
            mult = multiList[i];
        }
        BigInteger ns = nestedSquare(i);
        if (len <= i) {
            return ns.multiply(ns);
        }

        return product(primeList, i, len - i).multiply(ns.multiply(ns));
    }

    private static BigInteger product(int[] a, int start, int length) {
        if (length == 0) {
            return BigInteger.ONE;
        }

        int len = (length + 1) / 2;
        long[] b = new long[len];

        int i, j, k;

        for (k = 0, i = start, j = start + length - 1; i < j; i++, k++, j--) {
            b[k] = a[i] * (long) a[j];
        }

        if (i == j) {
            b[k++] = a[j];
        }

        return recProduct(b, 0, k - 1);
    }

    private static BigInteger recProduct(long[] s, int n, int m) {
        if (n > m) {
            return BigInteger.ONE;
        }
        if (n == m) {
            return BigInteger.valueOf(s[n]);
        }
        int k = (n + m) >> 1;
        return recProduct(s, n, k).multiply(recProduct(s, k + 1, m));
    }

    private static int primeFactors(int n) {
        int[] primes = new int[n < 17 ? 6 : (int) Math.floor(n / (Math.log(n) - 1.5))];
        int numPrimes = makePrimeList(n, primes);

        int maxBound = n / 2, count = 0;

        int start = indexOf(primes, 2, 0, numPrimes - 1);
        int end = indexOf(primes, n, start, numPrimes);

        for (int i = start; i < end; i++) {
            int prime = primes[i];
            int m = prime > maxBound ? 1 : 0;

            if (prime <= maxBound) {
                int q = n;
                while (q >= prime) {
                    m += q /= prime;
                }
            }

            primeList[count] = prime;
            multiList[count++] = m;
        }
        return count;
    }

    private static int indexOf(final int[] data, int value, int low, int high) {
        while (low < high) {
            int mid = (low + high) >>> 1;

            if (data[mid] < value) {
                low = mid + 1;
            } else {
                high = mid;
            }
        }

        if (low >= data.length) {
            return low;
        }

        if (data[low] == value) {
            low++;
        }

        return low;
    }

    private static int makePrimeList(int n, int[] prime) {
        boolean[] composite = new boolean[n / 3];

        sieveOfEratosthenes(composite);

        boolean toggle = false;
        int p = 5, i = 0, j = 2;

        prime[0] = 2;
        prime[1] = 3;

        while (p <= n) {
            if (!composite[i++]) {
                prime[j++] = p;
            }
            // -- never mind, it's ok.
            p += (toggle = !toggle) ? 2 : 4;
        }

        return j; // number of primes
    }

    private static void sieveOfEratosthenes(final boolean[] composite) {
        int d1 = 8;
        int d2 = 8;
        int p1 = 3;
        int p2 = 7;
        int s1 = 7;
        int s2 = 3;
        int n = 0;
        int len = composite.length;
        boolean toggle = false;

        while (s1 < len) { // -- scan sieve
            if (!composite[n++]) { // -- if a prime is found, cancel its multiples
                int inc = p1 + p2;

                for (int k = s1; k < len; k += inc) {
                    composite[k] = true;
                }

                for (int k = s1 + s2; k < len; k += inc) {
                    composite[k] = true;
                }
            }

            if (toggle = !toggle) { // Never mind, it's ok.
                s1 += d2;
                d1 += 16;
                p1 += 2;
                p2 += 2;
                s2 = p2;
            } else {
                s1 += d1;
                d2 += 8;
                p1 += 2;
                p2 += 6;
                s2 = p1;
            }
        }
    }

    public static void main(String[] args) {
        int n = Integer.parseInt(args[0]);
        long nanos = System.nanoTime();
        BigInteger fact = factorial(n);
        nanos = System.nanoTime() - nanos;
        // Commented out because it takes ages to print
        //System.out.println(fact);
        System.out.println(nanos / 1e9);
    }
}

14mRh4X0r

Posted 2014-05-06T11:34:39.803

Reputation: 348

Compiled with gcj -O3 --main=PrimeSieveFactorialSchoenhage PrimeSieveFactorialSchoenhage.java -o pf_nest_square_fact – 14mRh4X0r – 2014-05-12T15:28:54.343

Python 3, n = 100000

A simple algorithm change was all that was needed to bump the sample code up by 10000.

import time

def factorial(n):
    result = 1
    while n > 0:
        result *= n
        n = n - 1
    return result

start = time.clock()
answer = factorial(100000)
end = time.clock()

print(answer)
print("Time:", end - start, "sec")

Obviously not the most creative answer, but there's really only one way to do a factorial....

Doorknob

Posted 2014-05-06T11:34:39.803

Reputation: 68 138

Please give the score, see my edit. The bump would probably be because your machine is better than mine. – user80551 – 2014-05-06T17:46:49.823

Perl + C, n = about 3 million

Here I'm using the Math::BigInt::GMP library available on CPAN, which provides a massive speed boost for Perl's core Math::BigInt objects.

use v5.14;
use Time::HiRes 'time';
use Math::BigInt only => 'GMP';

sub factorial { Math::BigInt::->new(@_)->bfac }

my $start  = time;
my $answer = factorial( 3_000_000 );
my $end    = time;

say $answer;
say "Time: ", $end - $start, " sec";

Bear in mind that my computer is probably quite a bit slower than yours. Using your original Python script, I can only calculate factorial(40000) in 10 seconds; factorial(90000) takes a lot longer. (I hit Ctrl+C after a minute.) On your hardware, using Math::BigInt::GMP, you may well be able to calculate the factorial of 5 million or more in under 10 seconds.

One thing you may notice is that although the factorial is calculated incredibly quickly, printing out the result is very slow, taking about three times longer than the original calculation. This is because GMP internally uses a binary rather than decimal representation, and printing it out requires binary to decimal conversion.

tobyink

Posted 2014-05-06T11:34:39.803

Reputation: 1 233

I think GMP counts as an external resource. (Although it certainly makes things a hell of a lot easier than implementing prime factorization and Schönhage-Strassen multiplication from scratch.)

– r3mainer – 2014-05-06T13:14:24.490

3I was assuming that "external resource" referred to looking up solutions from a pre-computed set of answers in a database, or a web service, etc. – tobyink – 2014-05-06T13:29:18.883

Squeamish: libraries don't normally count as external resources unless they have a function which falls under the boring loopholes rule. – alexander-brett – 2014-05-06T17:01:27.550

1Tobyink: can you explain what your program does? it looks like you're just using a built-in function (bfac?) – alexander-brett – 2014-05-06T17:27:19.453

Yup. This answer is invalid, since it uses the factorial method of Math::BigInt

– 14mRh4X0r – 2014-05-09T10:34:19.697

Firstly, that rule was added after I'd written my answer. Secondly, Math::BigInt isn't a "built-in factorial function". Math::BigInt is bundled with Perl, but not built into it any more than Net::FTP, IO::Compress::Gzip, or Digest::MD5. – tobyink – 2014-05-09T11:06:13.280

Tobyink: we're supposed to be writing a factorial function, not using an existing one... using a library function falls under standard loopholes (see other discussions) – alexander-brett – 2014-05-12T08:24:12.013

C++ (x86_64-specific) - 3.0 (390000/130000)

(easily portable to x86-32, porting to other architectures implies a significant speed loss)

Here's my own micro-implementation of long arithmetic.
The calculation itself takes 10 seconds, and while the output is in easily printable form (see the operator<< overload), it takes some more time to print it.

#include <vector>
#include <iostream>
#include <stdint.h>
#include <ctime>

typedef uint64_t digit;
typedef std::vector<digit> number;

std::ostream &operator<<(std::ostream &s, const number &x)
{
    std::vector<char> o;
    size_t size = x.size() * 21;
    o.resize(size);
    size_t lud = 0;
    for(number::const_reverse_iterator i = x.rbegin(), end = x.rend(); i != end; i++)
    {
        digit carry = 0;
        int j;
        for(j = 0; j <= lud || carry; j++)
        {
            digit r = o[j] * (1LL << 32) + carry;
            o[j] = r % 10;
            carry = r / 10;
        }
        lud = j;
        carry = 0;
        for(j = 0; j <= lud || carry; j++)
        {
            digit r = o[j] * (1LL << 32) + carry;
            o[j] = r % 10;
            carry = r / 10;
        }
        lud = j;
        carry = *i;
        for(j = 0; carry; j++)
        {
            digit r = o[j] + (carry % 10);
            carry /= 10;
            carry += r / 10;
            o[j] = r % 10;
        }
        if(j > lud)
            lud = j;
    }
    for(int j = lud; j--;)
        s.put(o[j] + '0');
    return s;
}

inline uint64_t dmul(uint64_t x, uint64_t y, uint64_t &carry)
{
    asm("mulq %2" : "+a"(x), "=d"(carry) : "r"(y));
    return x;
}
inline digit dadd(digit x, digit y, digit &carry)
{
    asm("movq $0, %1; addq %2, %0; adcq %1, %1" : "+r"(x), "=r"(carry), "+r"(y));
    return x;
}

void multiply(number &x, digit y)
{
    x.resize(x.size() + 2);
    digit carry = 0;
    for(number::iterator i = x.begin(), end = x.end(); i != end; i++)
    {
        digit nc, res = dmul(*i, y, nc);
        *i = dadd(res, carry, carry);
        carry += nc;
    }
    size_t sz = x.size();
    for(number::const_reverse_iterator i = x.rbegin(), end = x.rend(); i != end; i++)
    {
        if(*i)
            break;
        sz--;
    }
    x.resize(sz);
}

int main()
{
    const int r = 390000;
    clock_t start = clock();
    number n;
    digit mult = 1;
    n.push_back(1);
    for(digit a = 2; a <= r; a++)
    {
        digit carry, m = dmul(mult, a, carry);
        if(carry)
        {
            multiply(n, mult);
            mult = a;
        }
        else
            mult = m;
    }
    multiply(n, mult);
    std::cout << "Took: " << (clock() - start)/((double)CLOCKS_PER_SEC) << std::endl;
    std::cout << n << std::endl;
}

mniip

Posted 2014-05-06T11:34:39.803

Reputation: 9 396

Check your score. You need to run the question's Python 2.7 program on your computer. For my computer, I compiled your program with g++ -O2 factorial.cc -o factorial and it scores 3.90 = 382000 / 98000. – kernigh – 2014-05-11T20:11:25.827

Weird, I got 3.9 and you got 3.0 for this program. I guess your faster computer is a penalty. Perhaps your program loses its advantage over Python as r increases. If so, and you can do a higher r in 10 seconds, then your score goes down. – kernigh – 2014-05-12T21:30:01.177

C#: 0,48 (77,000 / 160,000)

I'm not happy with this.

Is C# that slow?

But here is my entry anyway.

static void Main(string[] args)
    {
        Console.WriteLine("Enter N for fatorial:");
        int n = Convert.ToInt32(Console.ReadLine());

        Stopwatch s = Stopwatch.StartNew();


        BigInteger result = 1;
        while (0 <-- n) result *= n;

        s.Stop();

        Console.WriteLine("Output: {0} ", result);

        Console.WriteLine("Completed in {0}", s.Elapsed);

    }

When n = 77000 it takes 00:00:09:8708952 to calculate.

I'm running in Release mode, outside Visual Studio, using a Core i3-2330M @2.2GHz.

Edit: Since i'm not doing anything intelligent, I accept that result. Maybe the .NET Framework 4.5 is addind some overhead (or BigInteger isn't that fast).

Ricardo Pieper

Posted 2014-05-06T11:34:39.803

Reputation: 131

Please give the score and not just n – user80551 – 2014-05-06T18:10:46.893

1You could use zero approached by operator to make it prettier (like start with n = ... + 1 then do while (0 <-- n) result *= n;) – Cthulhu – 2014-05-06T18:50:59.820

1BigInteger for .NET probably did not implement algorithms for multiplying larger numbers, like Karatsuba or Toom-3. If so, this is a good example of how Python is faster. – kernigh – 2014-05-11T20:20:27.760

Python 2.7
5.94 = 1'200'000/202'000

def fast_fac(n):
    def prod(start, fin):
            if fin - start <= 50:
                    return reduce(lambda x,y: x*y, xrange(start, fin+1), 1)
            else:
                    mid = (start+fin) / 2
                    return prod(start, mid) * prod(mid+1, fin)
    return prod(1, n)

Makes use of relative ease of multiplication of many groups of small numbers and then multiplying them compared to large number of multiplyings involving huge number.

Cthulhu

Posted 2014-05-06T11:34:39.803

Reputation: 111

bc, score=0.19

What the heck, here's my contender for "How much can you slowly multiply?"

bc is "An arbitrary precision calculator language", but unfortunately rather slow:

n=read()
for(f=i=1;i<=n;i++)f*=i
f
quit

In about 10 seconds on my mid 2012 MacBook Pro (2.3 GHz Intel Core i7) the reference python answer can calculate 122000!, but this bc script can only calculate 23600!.

Conversely 10000! takes 1.5s with the python reference script, but the bc script takes 50s.

Oh dear.

Digital Trauma

Posted 2014-05-06T11:34:39.803

Reputation: 64 644

1OpenBSD bc(1) is faster. Your program scores 0.29 = 28000/98000. There is no read(), so I ran time sed 's/read()/28000/' factorial.bc | bc. – kernigh – 2014-05-11T15:33:32.157

Bash: score = 0.001206 (181/150000)

I stole the math functions from Rosettacode - Long multiplication I didn't analyzed nor tried to optimize.
You are free to change the algorithm or to try a different strings split method.

#!/bin/bash


add() { # arbitrary-precision addition
  if (( ${#1} < ${#2} )); then
    local a="$2" b="$1" sum= carry=0
  else
    local a="$1" b="$2" sum= carry=0
  fi

  while (( ${#a} )); do
    local -i d1="${a##${a%?}}" d2="10#0${b##${b%?}}" s=carry+d1+d2
    sum="${s##${s%?}}$sum"
    carry="10#0${s%?}"
    a="${a%?}" b="${b%?}"
  done
  echo "$sum"
}

multiply() { # arbitrary-precision multiplication
  if (( ${#1} < ${#2} )); then
    local a="$2" b="$1" product=0
  else
    local a="$1" b="$2" product=0
  fi

  local zeroes=
  while (( ${#b} )); do
    local m1="$a"
    local m2="${b##${b%?}}"
    local partial=$zeroes 
    local -i carry=0
    while (( ${#m1} )); do 
      local -i d="${m1##${m1%?}}"
      m1="${m1%?}"
      local -i p=d*m2+carry
      partial="${p##${p%?}}$partial"
      carry="10#0${p%?}"
    done
    partial="${carry#0}$partial"
    product="$(add "$product" "$partial")"
    zeroes=0$zeroes
    b="${b%?}"
  done
  echo "$product"
}

# 'timerun' function
trap 'echo $((i -1)) $f; exit'  USR1  
(sleep 9.9; kill -USR1 $$)&

declare -i i 
f=1
for ((i=1; i< 10000 ; i++ ))   # 10000 is verry optimistic
do
    f=$(multiply $f $i)
done

Emmanuel

Posted 2014-05-06T11:34:39.803

Reputation: 259

1Please add the score and not just the highest n – user80551 – 2014-05-07T18:21:22.673

@user80551 it's done – Emmanuel – 2014-05-07T22:05:34.053

Python 3, advanced algo by Peter Luschny: 8.25x (1 280 000/155 000)

Shamelessly copied from Peter Luschny,
http://www.luschny.de/math/factorial/FastFactorialFunctions.htm,
who provides this code under the "Creative Commons Attribution-ShareAlike 3.0" license.

This is actually a quite advanced algorithm, using something called the "swinging factorial" and a list of primes. I suspect it could be even faster if it did like many of the other answers and performed most of the multiplications with 32 bit integers.

#! /usr/bin/python3
import time
import bisect 

def Primes(n) : 
  primes = [2, 3] 
  lim, tog = n // 3, False 
  composite = [False for i in range(lim)] 

  d1 = 8; d2 = 8; p1 = 3; p2 = 7; s = 7; s2 = 3; m = -1 

  while s < lim :             # --  scan the sieve 
      m += 1                  # --  if a prime is found 
      if not composite[m] :   # --  cancel its multiples 
          inc = p1 + p2 
          for k in range(s,      lim, inc) : composite[k] = True 
          for k in range(s + s2, lim, inc) : composite[k] = True 

          tog = not tog 
          if tog: s += d2; d1 += 16; p1 += 2; p2 += 2; s2 = p2 
          else:   s += d1; d2 +=  8; p1 += 2; p2 += 6; s2 = p1 

  k, p, tog = 0, 5, False 
  while p <= n : 
      if not composite[k] : primes.append(p) 
      k += 1; 
      tog = not tog 
      p += 2 if tog else 4 

  return primes 

def isqrt(x): 
  ''' 
  Writing your own square root function
  ''' 
  if x < 0: raise ValueError('square root not defined for negative numbers') 
  n = int(x) 
  if n == 0: return 0 
  a, b = divmod(n.bit_length(), 2) 
  x = 2**(a + b) 
  while True: 
      y = (x + n // x) // 2 
      if y >= x: return x 
      x = y 

def product(s, n, m): 
  if n > m: return 1 
  if n == m: return s[n] 
  k = (n + m) // 2 
  return product(s, n, k) * product(s, k + 1, m) 

def factorialPS(n): 

  small_swing = [1,1,1,3,3,15,5,35,35,315,63,693,231,3003,429,6435,6435, 
          109395,12155,230945,46189,969969,88179,2028117,676039,16900975, 
          1300075,35102025,5014575,145422675,9694845,300540195,300540195] 

  def swing(m, primes): 
      if m < 33: return small_swing[m] 

      s = bisect.bisect_left(primes, 1 + isqrt(m)) 
      d = bisect.bisect_left(primes, 1 + m // 3) 
      e = bisect.bisect_left(primes, 1 + m // 2) 
      g = bisect.bisect_left(primes, 1 + m) 

      factors = primes[e:g] 
      factors += filter(lambda x: (m // x) & 1 == 1, primes[s:d]) 
      for prime in primes[1:s]:   
          p, q = 1, m 
          while True: 
              q //= prime 
              if q == 0: break 
              if q & 1 == 1: 
                  p *= prime 
          if p > 1: factors.append(p) 

      return product(factors, 0, len(factors) - 1) 

  def odd_factorial(n, primes): 
      if n < 2: return 1 
      return (odd_factorial(n // 2, primes)**2) * swing(n, primes) 

  def eval(n): 
      if n < 0: 
          raise ValueError('factorial not defined for negative numbers') 

      if n == 0: return 1 
      if n < 20: return product(range(2, n + 1), 0, n-2) 

      N, bits = n, n 
      while N != 0: 
          bits -= N & 1 
          N >>= 1 

      primes = Primes(n) 
      return odd_factorial(n, primes) * 2**bits 

  return eval(n)

start = time.time()
answer = factorialPS(1280000) 
print(time.time()-start)

semi-extrinsic

Posted 2014-05-06T11:34:39.803

Reputation: 641

Java - 10.9

n = 885000

Mergesort-y.

import java.math.BigInteger;

public class Factorials {

    public static BigInteger fac;

    public static BigInteger two = BigInteger.valueOf(2);

    static BigInteger mul(BigInteger start, BigInteger end) {
        if(start.equals(end)) {
            return start;
        } else {
            BigInteger mid = start.add(end.subtract(start).divide(Factorials.two));
            return Factorials.mul(start, mid).multiply(Factorials.mul(mid.add(BigInteger.ONE), end));
        }
    }

    public static void main(String[] args) {
        Factorials.fac = BigInteger.valueOf(Integer.parseInt(args[0]));
        long t = System.nanoTime();
        BigInteger result = mul(BigInteger.ONE, fac);
        t = System.nanoTime() - t;
        System.out.print(String.valueOf(((float) t) / 1000000000)); //result.toString()+" @ "+
    }
}

BigIntegers are slow.

Recommendations for arbitrary-precision high-speed Java integer libraries? :P

cjfaure

Posted 2014-05-06T11:34:39.803

Reputation: 4 213

Mergesort-y It's called divide-and-conquer. – johnchen902 – 2018-05-30T05:52:23.310

Can I steal your code to make it multithreaded? – Simon Kuang – 2014-05-11T06:47:36.810

@SimonKuang Go ahead. :P Multi-threaded entries aren't allowed here, though. Also, you might want to use a more efficient BigInteger implementation. – cjfaure – 2014-05-11T10:13:34.247

tcl, 13757

My answer is to check the limits of tcl.

The first line is only to set an input parameter:

set n 13757

The others are the algorithm itself:

set r 2
for {set i 3} {$i <= $n} {incr i} {set r [expr {$r*$i}]}   
puts $r

I tested my code on http://rextester.com/live/WEL36956; If I make n bigger, I get a SIGKILL; may n can get bigger on a local tcl interpreter, which I do not have.

sergiol

Posted 2014-05-06T11:34:39.803

Reputation: 3 055

Python 3: 280000 / 168000

Time running your program: between 9.87585953253 and 10.3046453994. Time running my program: about 10.35296977897559.

import time

def factorial(n):
    f = 1
    while n > 1:
        hn = n >> 1
        f = f * 2**hn * double_factorial(n) #dfl[hn + (n & 1) - 1]
        n = hn
    return f
def double_factorial(n):
    #dfl = [1]
    p = 1
    l = 3
    mh = n
    while l <= n:
        p *= l
        l += 2
        #dfl.append(p)
    return p

start = time.clock()
factorial(280000)
end = time.clock()

print(end - start)

I read this answer on cs.SE and decided to try to implement it in Python. However, I accidentally discovered that n! = (⌊n / 2⌋)! * 2**(⌊n / 2⌋) * n!! (note: !! is the double factorial). So I converted that to a non-recursive form.

The comments show my attempt to avoid recomputing the double factorial, but I discovered that storing every value was too memory-costly that it caused my computer to run even slower. I can improve this by only storing what is needed.

Strangely, I implemented the naive straight multiplication in Python 3, and it does better than your program: n = 169000 in 10 seconds.:

def factorial(n):
    p=1
    for i in range(n):
        p*=i+1
    return p

Justin

Posted 2014-05-06T11:34:39.803

Reputation: 19 757

Ruby 2.1

score = 1.80 = 176_000 / 98_000

EDIT: improved from ~~1.35 = 132_000 / 98_000~~

I took ideas from the GMP factorial algorithm. This program uses the standard library to generate prime numbers. Ruby is a bad choice because multiplication seems slower in Ruby than in Python.

My program in Ruby 2.1: score = 1.80 = 176_000 / 98_000
Trivial algorithm in Python 2.7: score = 1 = 98_000 / 98_000
Trivial algorithm in Ruby 2.1: score = 0.878 = 86_000 / 98_000

Yes, my binary of ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-openbsd] links against GMP. Ruby 2.1 added a feature to use GMP for large multiplication, but it still seems slower than Python 2.7.

require 'benchmark'
require 'optparse'
require 'prime'

def factorial(n)
  # calculate primes up to n, drop the 2
  @odd_primes = Prime.each(n).drop(1)

  # count prime factors of factorial(n)
  @factors = Hash.new(0)
  factorial_recurse(n)

  shift = @factors.delete(2) || 0
  @factors.inject(1) {|product, (base, exp)|
    product * base**exp
  } << shift
end

def factorial_recurse(n)
  return if n < 2

  # collect prime factors of 2 * 4 * 6 * .. * n
  #  = (2 * 2 * 2 * .. * 2) * (1 * 2 * 3 * .. * exp)
  #  = 2**exp * factorial(exp) where exp = floor(n/2)
  exp = n >> 1
  factorial_recurse(exp)
  @factors[2] += exp

  # collect prime factors 3 * 5 * 7 * ... * n
  for prime in @odd_primes
    break if prime > n
    exp = 0
    # count occurences of prime, prime**2, prime**3, .. n
    prime_power = prime
    until prime_power > n
      # floor(n / prime_power) occurences in 1 * 2 * .. * n,
      # but only ceil(count / 2) occurences in 3 * 5 * .. * n
      @factors[prime] += (n / prime_power + 1) >> 1
      prime_power *= prime
    end
  end
end

# usage: factorial.rb [-ct] [number]
cflag = tflag = false
OptionParser.new {|opts|
  opts.on('-c', 'Check for bugs') { cflag = true }
  opts.on('-t', 'Use trivial algorithm') { tflag = true }
  opts.parse!
}
$*[1] and fail 'too many arguments'
n = Integer($*[0] || 176_000)

if cflag
  factorial(n) == (1..n).reduce(1, :*) or
    fail "bad program: factorial(#{n}) is wrong"
  puts "ok"
  exit
end

# measure processor time to calculate factorial
f = nil
if tflag
  time = Benchmark.measure { f = (1..n).reduce(1, :*) }
else
  time = Benchmark.measure { f = factorial(n) }
end
puts f
puts "Time #{time.total} sec"

kernigh

Posted 2014-05-06T11:34:39.803

Reputation: 2 615

Julia - Score = 15.194

Utilising the exact same approach as that of the reference program... that is,

f(n)=reduce(*,1:big(n))

So it uses reduce, the basic binary multiplication operation, and a range (in this case, using big(n) to force the calculation to be done in BigInt rather than Int64). From this, I get

julia> @time K=f(2340000);
elapsed time: 9.991324093 seconds (814552840 bytes allocated)

On my computer, with reference program running with input of 154000, I get Time: 10.041181 sec output (run using python ./test.py, where test.py is the name of the file containing the reference code)

Glen O

Posted 2014-05-06T11:34:39.803

Reputation: 2 548

How much can you quickly multiply?

Rules

My program

Answers

C++ with GMP, score = 55.55 (10,000,000 / 180,000)

Python 2.7

42.575 = ( 6,812,000 / 160,000 ) approx

Code:

Test:

Output:

How it works:

Golfed Version of fac1 (because I can), 220B

Java - 125.15 (21,400,000 / 171,000)

Python 3, n = 100000

Perl + C, n = about 3 million

C++ (x86_64-specific) - 3.0 (390000/130000)

C#: 0,48 (77,000 / 160,000)

bc, score=0.19

Bash: score = 0.001206 (181/150000)

Python 3, advanced algo by Peter Luschny: 8.25x (1 280 000/155 000)

Java - 10.9

n = 885000

Recommendations for arbitrary-precision high-speed Java integer libraries? :P

tcl, 13757

Python 3: 280000 / 168000

Ruby 2.1

score = 1.80 = 176_000 / 98_000

Julia - Score = 15.194

Golfed Version of `fac1` (because I can), 220B