## How much can you quickly multiply?

12

4

With the recent Python bashing, here's an attempt to show Python's strengths. Your challenge is to write a program that calculates the factorial of as high a number n as possible within 10 seconds.

## Rules

• You must calculate an exact integer solution. Since the factorial would be much higher than what can fit in a 64 bit unsigned integer, you can use strings if your language does not support large integers
• Standard loopholes are forbidden. Particularly, you cannot use any external resources.
• Only the calculation part(this includes time for any workarounds using strings) adds to the total time which should be under 10 seconds on average.
• You must store the output in an easily printable form (as printing takes time) (see my program below), string, variable, character array, etc.

EDIT:

• Your program must give the correct output for all n: 1 <= n <= (your highest n)

EDIT2:

## My program

from __future__ import print_function
import time

def factorial( n ):
return reduce( ( lambda x , y : x * y ) , xrange( 1 , n + 1 ) , 1 )

start = time.clock()
end = time.clock()

print ( "Time:" , end - start , "sec" )

Highest score wins. For the record, my code can manage n = 90000 in about 9.89 seconds on a Pentium 4 3.0 GHz

EDIT: Can everyone please add the score rather than just the highest n. Just the highest n has no meaning by itself as it depends on your hardware. It's impossible to have an objective winning criterion otherwise. ali0sha's anwer does this correctly.

We have a winner. I didn't accept the java answer https://codegolf.stackexchange.com/a/26974/8766 as it kind of skirts close to http://meta.codegolf.stackexchange.com/a/1080/8766

1You can use operator.mul instead of the lambda function – gnibbler – 2014-05-06T11:57:59.473

You should probably add that print([1, 2, 6, 10, 50, 120, ...][n]) is invalid. – Doorknob – 2014-05-06T12:09:26.463

1Bit suprised this works, but assuming I read the rules correctly this MATLAB solution would be nice: factorial(Inf), returns Inf in a fraction of a second. – Dennis Jaheruddin – 2014-05-06T12:40:35.777

Is my assumption that you are looking for an answer in [python] correct? – Justin – 2014-05-06T12:44:26.267

1@Doorknob That fits in the standard loopholes. – Justin – 2014-05-06T12:44:43.553

1@DennisJaheruddin, it's a bit of a stretch to refer to "Inf" as an "exact integer solution". – tobyink – 2014-05-06T13:12:27.833

How is bigint math a "strength" of Python? It doesn't even use GMP. – Niklas B. – 2014-05-06T13:35:47.197

1@Quincunx No, any language is allowed. – user80551 – 2014-05-06T17:22:51.027

@NiklasB. "It doesn't even use GMP."--I just used the simplest code so that others can make it more efficient – user80551 – 2014-05-06T17:37:50.833

@user80551 I was referring to your statement "here's an attempt to show Python's strengths", because this is clearly not a task where Python can shine (unless you use bindings to non-standard libraries). Still a valid challenge, of course :) – Niklas B. – 2014-05-06T17:41:52.553

@NiklasB. Hence the word attempt, Do you want me to change it to lame attempt or something? – user80551 – 2014-05-06T17:50:26.350

@NiklasB. C/C++ don't even have a bigint class at all. In fact python's bigints are quite fast compared to other bigints implementations (I believe F# is about 3 times slower from a comment I receive some day ago...) If you want to include languages that can load/import a bigint implementation, so can python. In summary: I don't see the point of your comment. – Bakuriu – 2014-05-06T19:11:33.427

@Bakuriu Sure, but Ruby, Haskell or other languages that use GMP by default will beat it hands down. Although I admit that my comment was lacking a proper point :) Nevermind – Niklas B. – 2014-05-06T19:12:32.033

1@NiklasB. Are you sure? Because on my machine Haskell is about tens of times slower at computing factorials... In fact I get a stackoverflow with the factorial of 1000000. Python's built-in math.factorial computes it without problems in about 12 seconds. – Bakuriu – 2014-05-06T19:28:47.813

1@Bakuriu Interesting, I can reproduce that. Seems like Python 3 is about 10 times faster than Python 2.7, nice. I will definitely shut my mouth now :) Not sure why Haskell is so slow actually – Niklas B. – 2014-05-06T20:48:35.370

Oh god you seriously implemented this with a reduce and a lambda? Really really inefficient even for python. – Claudiu – 2014-05-07T02:53:18.897

@Claudiu See my previous comment to Niklas B. , I just used the simplest method that came to my mind and didn't bother optimizing. Optimizing the code is part of the challenge. – user80551 – 2014-05-07T05:23:03.897

Could the down-voter please explain what's wrong with the question? – user80551 – 2014-05-07T09:04:01.550

How can someone measure the correct score if they're using a much more powerful CPU than you? – phuclv – 2014-06-15T10:17:42.587

I really wished that someone would provide a creative solution with bitshifts and lots of recursion. – Simon Kuang – 2014-07-28T00:07:50.557

7

## C++ with GMP, score = 55.55 (10,000,000 / 180,000)

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <queue>
#include <gmpxx.h>

int main(int argc, char *argv[]) {
uint64_t n = atoi(argv[1]);

// Iterate through 1..n.  Strip off powers of 2.  Multiply
// remainders together into <= 64 bit chunks.
uint64_t twos = 0;
std::vector<uint64_t> terms;
uint64_t m = 1;
for(uint64_t i = 1; i <= n; i++) {
uint64_t j = __builtin_ctzll(i);
twos += j;
uint64_t k = i >> j;
if(__builtin_clzll(m) + __builtin_clzll(k) >= 64) {
m *= k;
} else {
terms.push_back(m);
m = k;
}
}
if(m != 1) terms.push_back(m);

// convert to gmp
// why isn't there a 64-bit constructor?
std::queue<mpz_class> gmpterms;
for(int i = 0; i < terms.size(); i++) {
mpz_class x = (uint32_t)(terms[i] >> 32);
x <<= 32;
x += (uint32_t)terms[i];
gmpterms.push(x);
}

// pop two from the bottom, multiply them, push on the end.
while(gmpterms.size() > 1) {
mpz_class a = gmpterms.front();
gmpterms.pop();
mpz_class b = gmpterms.front();
gmpterms.pop();
gmpterms.push(a * b);
}

mpz_class r = gmpterms.front();
r <<= twos;
//std::cout << r << std::endl;
}

8

# 42.575 = ( 6,812,000 / 160,000 ) approx

### Code:

import gmpy2

def fac1(n):
m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
L=map(gmpy2.mpz,xrange(1,n+1))
Number = (len(L)-1).bit_length()
while Number:Number-=1;L=m(L)
return L[0]

def fac2(n):
global E; E=0
def f(i):
global E; E+=i//2
return[]if i==1 else f(i//2)+range(3,i,2)+[[1,i][i%2]]
m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
L=map(gmpy2.mpz,f(n))
N=(len(L)-1).bit_length()
while N: N-=1;L=m(L)
return L[0]<<E

### Test:

import time

start = time.time()
baseline(160000)
print time.time()-start

start = time.time()
fac1(6811000)
print time.time()-start

start = time.time()
fac2(6812000)
print time.time()-start

start = time.time()
gmpy2.fac(26000000)
print time.time()-start

10.0069999695
10.0729999542
10.0360000134
9.98699998856

### How it works:

Bigger multiplications take more time, thus we want to do as many small multiplications as possible. This is especially true in Python where for numbers less that 2^64 we use hardware arithmetic, and above that we use software. So, in m(L), we start with a list L; if it's odd length we remove one number from consideration to make it even again. Then we multiply element 1 with element -2, element 3 with -4, etc, so that

m([1,2,3,4,5,6,7,8]) = [2*7, 4*5, 6*3, 8*1] = [14, 20, 18, 8]
m([10,12,6]) = [360,112]
m([120,6]) = [40320]

This approach ensures we're using hardware arithmetic for as long as possible, following which we switch onto the efficient gmc arithmetic library.

In fac2, we take a more classic divide and conquer approach as well, where we split out every multiple of 2 and bitshift them at the end for a trivial performance boost. I've included it here because it's usually around 0.5% faster than fac1.

### Golfed Version of fac1 (because I can), 220B

import gmpy2
def f(n):
m=lambda(L):([]if len(L)%2==0 else[L.pop()])+map(lambda(l):l[0]*l[1],zip(L[1::2],L[-2::-2]))
L=map(gmpy2.mpz,xrange(1,n+1));N=(len(L)-1).bit_length()
while N:N-=1;L=m(L)
return L[0]

1If the GMP backend includes a bitshift function then you can keep the numbers even smaller by dividing each number in the list by 2 until it's even and then doing a single shift at the end. – Peter Taylor – 2014-05-06T15:58:52.453

Where did you get gmpy2 from? $python Python 2.7.3 (default, Feb 27 2014, 19:58:35) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. from gmpy2 import mpz Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named gmpy2 – user80551 – 2014-05-06T17:57:52.010 @user80551: https://code.google.com/p/gmpy/ (the top google search result) has installers for many different platforms. – alexander-brett – 2014-05-06T19:35:38.603 For the golfed version, couldn't you do while len(L): ... instead of while len(L)>1: ...? – user80551 – 2014-05-07T09:10:59.840 No: the function inside that loop will never take the list below length 1 and anyhow we need the first element! – alexander-brett – 2014-05-07T09:12:57.247 2 # Java - 125.15 (21,400,000 / 171,000) Also shamelessly copied from Peter Luschny's Github repo (thanks @semi-extrinsic) and licensed under the MIT license, this uses the "prime factorization nested squaring" algorithm as proposed by Albert Schönhage et al. (according to Luschny's factorial algorithms description page). I slightly adapted the algorithm to use Java's BigInteger and to not use a lookup table for n < 20. Compiled with gcj, which uses GMP for its BigInteger implementation, and ran on Linux 3.12.4 (Gentoo), on a Core i7 4700MQ at 2.40GHz import java.math.BigInteger; public class PrimeSieveFactorialSchoenhage { private static int[] primeList, multiList; public static BigInteger factorial(int n) { int log2n = 31 - Integer.numberOfLeadingZeros(n); int piN = log2n < 2 ? 1 : 2 + (15 * n) / (8 * (log2n - 1)); primeList = new int[piN]; multiList = new int[piN]; int len = primeFactors(n); return nestedSquare(len).shiftLeft(n - Integer.bitCount(n)); } private static BigInteger nestedSquare(int len) { if (len == 0) { return BigInteger.ONE; } int i = 0, mult = multiList[0]; while (mult > 1) { if ((mult & 1) == 1) { // is mult odd ? primeList[len++] = primeList[i]; } multiList[i++] = mult / 2; mult = multiList[i]; } BigInteger ns = nestedSquare(i); if (len <= i) { return ns.multiply(ns); } return product(primeList, i, len - i).multiply(ns.multiply(ns)); } private static BigInteger product(int[] a, int start, int length) { if (length == 0) { return BigInteger.ONE; } int len = (length + 1) / 2; long[] b = new long[len]; int i, j, k; for (k = 0, i = start, j = start + length - 1; i < j; i++, k++, j--) { b[k] = a[i] * (long) a[j]; } if (i == j) { b[k++] = a[j]; } return recProduct(b, 0, k - 1); } private static BigInteger recProduct(long[] s, int n, int m) { if (n > m) { return BigInteger.ONE; } if (n == m) { return BigInteger.valueOf(s[n]); } int k = (n + m) >> 1; return recProduct(s, n, k).multiply(recProduct(s, k + 1, m)); } private static int primeFactors(int n) { int[] primes = new int[n < 17 ? 6 : (int) Math.floor(n / (Math.log(n) - 1.5))]; int numPrimes = makePrimeList(n, primes); int maxBound = n / 2, count = 0; int start = indexOf(primes, 2, 0, numPrimes - 1); int end = indexOf(primes, n, start, numPrimes); for (int i = start; i < end; i++) { int prime = primes[i]; int m = prime > maxBound ? 1 : 0; if (prime <= maxBound) { int q = n; while (q >= prime) { m += q /= prime; } } primeList[count] = prime; multiList[count++] = m; } return count; } private static int indexOf(final int[] data, int value, int low, int high) { while (low < high) { int mid = (low + high) >>> 1; if (data[mid] < value) { low = mid + 1; } else { high = mid; } } if (low >= data.length) { return low; } if (data[low] == value) { low++; } return low; } private static int makePrimeList(int n, int[] prime) { boolean[] composite = new boolean[n / 3]; sieveOfEratosthenes(composite); boolean toggle = false; int p = 5, i = 0, j = 2; prime[0] = 2; prime[1] = 3; while (p <= n) { if (!composite[i++]) { prime[j++] = p; } // -- never mind, it's ok. p += (toggle = !toggle) ? 2 : 4; } return j; // number of primes } private static void sieveOfEratosthenes(final boolean[] composite) { int d1 = 8; int d2 = 8; int p1 = 3; int p2 = 7; int s1 = 7; int s2 = 3; int n = 0; int len = composite.length; boolean toggle = false; while (s1 < len) { // -- scan sieve if (!composite[n++]) { // -- if a prime is found, cancel its multiples int inc = p1 + p2; for (int k = s1; k < len; k += inc) { composite[k] = true; } for (int k = s1 + s2; k < len; k += inc) { composite[k] = true; } } if (toggle = !toggle) { // Never mind, it's ok. s1 += d2; d1 += 16; p1 += 2; p2 += 2; s2 = p2; } else { s1 += d1; d2 += 8; p1 += 2; p2 += 6; s2 = p1; } } } public static void main(String[] args) { int n = Integer.parseInt(args[0]); long nanos = System.nanoTime(); BigInteger fact = factorial(n); nanos = System.nanoTime() - nanos; // Commented out because it takes ages to print //System.out.println(fact); System.out.println(nanos / 1e9); } } Compiled with gcj -O3 --main=PrimeSieveFactorialSchoenhage PrimeSieveFactorialSchoenhage.java -o pf_nest_square_fact – 14mRh4X0r – 2014-05-12T15:28:54.343 1 # Python 3, n = 100000 A simple algorithm change was all that was needed to bump the sample code up by 10000. import time def factorial(n): result = 1 while n > 0: result *= n n = n - 1 return result start = time.clock() answer = factorial(100000) end = time.clock() print(answer) print("Time:", end - start, "sec") Obviously not the most creative answer, but there's really only one way to do a factorial.... Please give the score, see my edit. The bump would probably be because your machine is better than mine. – user80551 – 2014-05-06T17:46:49.823 1 ## Perl + C, n = about 3 million Here I'm using the Math::BigInt::GMP library available on CPAN, which provides a massive speed boost for Perl's core Math::BigInt objects. use v5.14; use Time::HiRes 'time'; use Math::BigInt only => 'GMP'; sub factorial { Math::BigInt::->new(@_)->bfac } my$start  = time;
my $answer = factorial( 3_000_000 ); my$end    = time;

say $answer; say "Time: ",$end - $start, " sec"; Bear in mind that my computer is probably quite a bit slower than yours. Using your original Python script, I can only calculate factorial(40000) in 10 seconds; factorial(90000) takes a lot longer. (I hit Ctrl+C after a minute.) On your hardware, using Math::BigInt::GMP, you may well be able to calculate the factorial of 5 million or more in under 10 seconds. One thing you may notice is that although the factorial is calculated incredibly quickly, printing out the result is very slow, taking about three times longer than the original calculation. This is because GMP internally uses a binary rather than decimal representation, and printing it out requires binary to decimal conversion. 1 I think GMP counts as an external resource. (Although it certainly makes things a hell of a lot easier than implementing prime factorization and Schönhage-Strassen multiplication from scratch.) – r3mainer – 2014-05-06T13:14:24.490 3I was assuming that "external resource" referred to looking up solutions from a pre-computed set of answers in a database, or a web service, etc. – tobyink – 2014-05-06T13:29:18.883 Squeamish: libraries don't normally count as external resources unless they have a function which falls under the boring loopholes rule. – alexander-brett – 2014-05-06T17:01:27.550 1Tobyink: can you explain what your program does? it looks like you're just using a built-in function (bfac?) – alexander-brett – 2014-05-06T17:27:19.453 Yup. This answer is invalid, since it uses the factorial method of Math::BigInt – 14mRh4X0r – 2014-05-09T10:34:19.697 Firstly, that rule was added after I'd written my answer. Secondly, Math::BigInt isn't a "built-in factorial function". Math::BigInt is bundled with Perl, but not built into it any more than Net::FTP, IO::Compress::Gzip, or Digest::MD5. – tobyink – 2014-05-09T11:06:13.280 Tobyink: we're supposed to be writing a factorial function, not using an existing one... using a library function falls under standard loopholes (see other discussions) – alexander-brett – 2014-05-12T08:24:12.013 1 # C++ (x86_64-specific) - 3.0 (390000/130000) (easily portable to x86-32, porting to other architectures implies a significant speed loss) Here's my own micro-implementation of long arithmetic. The calculation itself takes 10 seconds, and while the output is in easily printable form (see the operator<< overload), it takes some more time to print it. #include <vector> #include <iostream> #include <stdint.h> #include <ctime> typedef uint64_t digit; typedef std::vector<digit> number; std::ostream &operator<<(std::ostream &s, const number &x) { std::vector<char> o; size_t size = x.size() * 21; o.resize(size); size_t lud = 0; for(number::const_reverse_iterator i = x.rbegin(), end = x.rend(); i != end; i++) { digit carry = 0; int j; for(j = 0; j <= lud || carry; j++) { digit r = o[j] * (1LL << 32) + carry; o[j] = r % 10; carry = r / 10; } lud = j; carry = 0; for(j = 0; j <= lud || carry; j++) { digit r = o[j] * (1LL << 32) + carry; o[j] = r % 10; carry = r / 10; } lud = j; carry = *i; for(j = 0; carry; j++) { digit r = o[j] + (carry % 10); carry /= 10; carry += r / 10; o[j] = r % 10; } if(j > lud) lud = j; } for(int j = lud; j--;) s.put(o[j] + '0'); return s; } inline uint64_t dmul(uint64_t x, uint64_t y, uint64_t &carry) { asm("mulq %2" : "+a"(x), "=d"(carry) : "r"(y)); return x; } inline digit dadd(digit x, digit y, digit &carry) { asm("movq$0, %1; addq %2, %0; adcq %1, %1" : "+r"(x), "=r"(carry), "+r"(y));
return x;
}

void multiply(number &x, digit y)
{
x.resize(x.size() + 2);
digit carry = 0;
for(number::iterator i = x.begin(), end = x.end(); i != end; i++)
{
digit nc, res = dmul(*i, y, nc);
carry += nc;
}
size_t sz = x.size();
for(number::const_reverse_iterator i = x.rbegin(), end = x.rend(); i != end; i++)
{
if(*i)
break;
sz--;
}
x.resize(sz);
}

int main()
{
const int r = 390000;
clock_t start = clock();
number n;
digit mult = 1;
n.push_back(1);
for(digit a = 2; a <= r; a++)
{
digit carry, m = dmul(mult, a, carry);
if(carry)
{
multiply(n, mult);
mult = a;
}
else
mult = m;
}
multiply(n, mult);
std::cout << "Took: " << (clock() - start)/((double)CLOCKS_PER_SEC) << std::endl;
std::cout << n << std::endl;
}

Check your score. You need to run the question's Python 2.7 program on your computer. For my computer, I compiled your program with g++ -O2 factorial.cc -o factorial and it scores 3.90 = 382000 / 98000. – kernigh – 2014-05-11T20:11:25.827

Weird, I got 3.9 and you got 3.0 for this program. I guess your faster computer is a penalty. Perhaps your program loses its advantage over Python as r increases. If so, and you can do a higher r in 10 seconds, then your score goes down. – kernigh – 2014-05-12T21:30:01.177

1

# C#: 0,48 (77,000 / 160,000)

I'm not happy with this.

Is C# that slow?

But here is my entry anyway.

static void Main(string[] args)
{
Console.WriteLine("Enter N for fatorial:");

Stopwatch s = Stopwatch.StartNew();

BigInteger result = 1;
while (0 <-- n) result *= n;

s.Stop();

Console.WriteLine("Output: {0} ", result);

Console.WriteLine("Completed in {0}", s.Elapsed);

}

When n = 77000 it takes 00:00:09:8708952 to calculate.

I'm running in Release mode, outside Visual Studio, using a Core i3-2330M @2.2GHz.

Edit: Since i'm not doing anything intelligent, I accept that result. Maybe the .NET Framework 4.5 is addind some overhead (or BigInteger isn't that fast).

Please give the score and not just n – user80551 – 2014-05-06T18:10:46.893

1You could use zero approached by operator to make it prettier (like start with n = ... + 1 then do while (0 <-- n) result *= n;) – Cthulhu – 2014-05-06T18:50:59.820

1BigInteger for .NET probably did not implement algorithms for multiplying larger numbers, like Karatsuba or Toom-3. If so, this is a good example of how Python is faster. – kernigh – 2014-05-11T20:20:27.760

1

Python 2.7
5.94 = 1'200'000/202'000

def fast_fac(n):
def prod(start, fin):
if fin - start <= 50:
return reduce(lambda x,y: x*y, xrange(start, fin+1), 1)
else:
mid = (start+fin) / 2
return prod(start, mid) * prod(mid+1, fin)
return prod(1, n)

Makes use of relative ease of multiplication of many groups of small numbers and then multiplying them compared to large number of multiplyings involving huge number.

1

# bc, score=0.19

What the heck, here's my contender for "How much can you slowly multiply?"

bc is "An arbitrary precision calculator language", but unfortunately rather slow:

for(f=i=1;i<=n;i++)f*=i
f
quit

In about 10 seconds on my mid 2012 MacBook Pro (2.3 GHz Intel Core i7) the reference python answer can calculate 122000!, but this bc script can only calculate 23600!.

Conversely 10000! takes 1.5s with the python reference script, but the bc script takes 50s.

Oh dear.

1OpenBSD bc(1) is faster. Your program scores 0.29 = 28000/98000. There is no read(), so I ran time sed 's/read()/28000/' factorial.bc | bc. – kernigh – 2014-05-11T15:33:32.157

1

## Bash: score = 0.001206 (181/150000)

I stole the math functions from Rosettacode - Long multiplication I didn't analyzed nor tried to optimize.
You are free to change the algorithm or to try a different strings split method.

#!/bin/bash

if (( ${#1} <${#2} )); then
local a="$2" b="$1" sum= carry=0
else
local a="$1" b="$2" sum= carry=0
fi

while (( ${#a} )); do local -i d1="${a##${a%?}}" d2="10#0${b##${b%?}}" s=carry+d1+d2 sum="${s##${s%?}}$sum"
carry="10#0${s%?}" a="${a%?}" b="${b%?}" done echo "$sum"
}

multiply() { # arbitrary-precision multiplication
if (( ${#1} <${#2} )); then
local a="$2" b="$1" product=0
else
local a="$1" b="$2" product=0
fi

local zeroes=
while (( ${#b} )); do local m1="$a"
local m2="${b##${b%?}}"
local partial=$zeroes local -i carry=0 while ((${#m1} )); do
local -i d="${m1##${m1%?}}"
m1="${m1%?}" local -i p=d*m2+carry partial="${p##${p%?}}$partial"
carry="10#0${p%?}" done partial="${carry#0}$partial" product="$(add "$product" "$partial")"
zeroes=0$zeroes b="${b%?}"
done
echo "$product" } # 'timerun' function trap 'echo$((i -1)) $f; exit' USR1 (sleep 9.9; kill -USR1$$)& declare -i i f=1 for ((i=1; i< 10000 ; i++ )) # 10000 is verry optimistic do f=$(multiply $f$i)
done

1Please add the score and not just the highest n – user80551 – 2014-05-07T18:21:22.673

@user80551 it's done – Emmanuel – 2014-05-07T22:05:34.053

1

# Python 3, advanced algo by Peter Luschny: 8.25x (1 280 000/155 000)

Shamelessly copied from Peter Luschny,
http://www.luschny.de/math/factorial/FastFactorialFunctions.htm,

This is actually a quite advanced algorithm, using something called the "swinging factorial" and a list of primes. I suspect it could be even faster if it did like many of the other answers and performed most of the multiplications with 32 bit integers.

#! /usr/bin/python3
import time
import bisect

def Primes(n) :
primes = [2, 3]
lim, tog = n // 3, False
composite = [False for i in range(lim)]

d1 = 8; d2 = 8; p1 = 3; p2 = 7; s = 7; s2 = 3; m = -1

while s < lim :             # --  scan the sieve
m += 1                  # --  if a prime is found
if not composite[m] :   # --  cancel its multiples
inc = p1 + p2
for k in range(s,      lim, inc) : composite[k] = True
for k in range(s + s2, lim, inc) : composite[k] = True

tog = not tog
if tog: s += d2; d1 += 16; p1 += 2; p2 += 2; s2 = p2
else:   s += d1; d2 +=  8; p1 += 2; p2 += 6; s2 = p1

k, p, tog = 0, 5, False
while p <= n :
if not composite[k] : primes.append(p)
k += 1;
tog = not tog
p += 2 if tog else 4

return primes

def isqrt(x):
'''
Writing your own square root function
'''
if x < 0: raise ValueError('square root not defined for negative numbers')
n = int(x)
if n == 0: return 0
a, b = divmod(n.bit_length(), 2)
x = 2**(a + b)
while True:
y = (x + n // x) // 2
if y >= x: return x
x = y

def product(s, n, m):
if n > m: return 1
if n == m: return s[n]
k = (n + m) // 2
return product(s, n, k) * product(s, k + 1, m)

def factorialPS(n):

small_swing = [1,1,1,3,3,15,5,35,35,315,63,693,231,3003,429,6435,6435,
109395,12155,230945,46189,969969,88179,2028117,676039,16900975,
1300075,35102025,5014575,145422675,9694845,300540195,300540195]

def swing(m, primes):
if m < 33: return small_swing[m]

s = bisect.bisect_left(primes, 1 + isqrt(m))
d = bisect.bisect_left(primes, 1 + m // 3)
e = bisect.bisect_left(primes, 1 + m // 2)
g = bisect.bisect_left(primes, 1 + m)

factors = primes[e:g]
factors += filter(lambda x: (m // x) & 1 == 1, primes[s:d])
for prime in primes[1:s]:
p, q = 1, m
while True:
q //= prime
if q == 0: break
if q & 1 == 1:
p *= prime
if p > 1: factors.append(p)

return product(factors, 0, len(factors) - 1)

def odd_factorial(n, primes):
if n < 2: return 1
return (odd_factorial(n // 2, primes)**2) * swing(n, primes)

def eval(n):
if n < 0:
raise ValueError('factorial not defined for negative numbers')

if n == 0: return 1
if n < 20: return product(range(2, n + 1), 0, n-2)

N, bits = n, n
while N != 0:
bits -= N & 1
N >>= 1

primes = Primes(n)
return odd_factorial(n, primes) * 2**bits

return eval(n)

start = time.time()
print(time.time()-start)

1

# Java - 10.9

### n = 885000

Mergesort-y.

import java.math.BigInteger;

public class Factorials {

public static BigInteger fac;

public static BigInteger two = BigInteger.valueOf(2);

static BigInteger mul(BigInteger start, BigInteger end) {
if(start.equals(end)) {
return start;
} else {
}
}

public static void main(String[] args) {
Factorials.fac = BigInteger.valueOf(Integer.parseInt(args[0]));
long t = System.nanoTime();
BigInteger result = mul(BigInteger.ONE, fac);
t = System.nanoTime() - t;
System.out.print(String.valueOf(((float) t) / 1000000000)); //result.toString()+" @ "+
}
}

BigIntegers are slow.

### Recommendations for arbitrary-precision high-speed Java integer libraries? :P

Mergesort-y It's called divide-and-conquer. – johnchen902 – 2018-05-30T05:52:23.310

Can I steal your code to make it multithreaded? – Simon Kuang – 2014-05-11T06:47:36.810

@SimonKuang Go ahead. :P Multi-threaded entries aren't allowed here, though. Also, you might want to use a more efficient BigInteger implementation. – cjfaure – 2014-05-11T10:13:34.247

0

# tcl, 13757

My answer is to check the limits of tcl.

The first line is only to set an input parameter:

set n 13757

The others are the algorithm itself:

set r 2
for {set i 3} {$i <=$n} {incr i} {set r [expr {$r*$i}]}
puts $r I tested my code on http://rextester.com/live/WEL36956; If I make n bigger, I get a SIGKILL; may n can get bigger on a local tcl interpreter, which I do not have. 0 # Python 3: 280000 / 168000 Time running your program: between 9.87585953253 and 10.3046453994. Time running my program: about 10.35296977897559. import time def factorial(n): f = 1 while n > 1: hn = n >> 1 f = f * 2**hn * double_factorial(n) #dfl[hn + (n & 1) - 1] n = hn return f def double_factorial(n): #dfl = [1] p = 1 l = 3 mh = n while l <= n: p *= l l += 2 #dfl.append(p) return p start = time.clock() factorial(280000) end = time.clock() print(end - start) I read this answer on cs.SE and decided to try to implement it in Python. However, I accidentally discovered that n! = (⌊n / 2⌋)! * 2**(⌊n / 2⌋) * n!! (note: !! is the double factorial). So I converted that to a non-recursive form. The comments show my attempt to avoid recomputing the double factorial, but I discovered that storing every value was too memory-costly that it caused my computer to run even slower. I can improve this by only storing what is needed. Strangely, I implemented the naive straight multiplication in Python 3, and it does better than your program: n = 169000 in 10 seconds.: def factorial(n): p=1 for i in range(n): p*=i+1 return p 0 # Ruby 2.1 # score = 1.80 = 176_000 / 98_000 EDIT: improved from 1.35 = 132_000 / 98_000 I took ideas from the GMP factorial algorithm. This program uses the standard library to generate prime numbers. Ruby is a bad choice because multiplication seems slower in Ruby than in Python. 1. My program in Ruby 2.1: score = 1.80 = 176_000 / 98_000 2. Trivial algorithm in Python 2.7: score = 1 = 98_000 / 98_000 3. Trivial algorithm in Ruby 2.1: score = 0.878 = 86_000 / 98_000 Yes, my binary of ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-openbsd] links against GMP. Ruby 2.1 added a feature to use GMP for large multiplication, but it still seems slower than Python 2.7. require 'benchmark' require 'optparse' require 'prime' def factorial(n) # calculate primes up to n, drop the 2 @odd_primes = Prime.each(n).drop(1) # count prime factors of factorial(n) @factors = Hash.new(0) factorial_recurse(n) shift = @factors.delete(2) || 0 @factors.inject(1) {|product, (base, exp)| product * base**exp } << shift end def factorial_recurse(n) return if n < 2 # collect prime factors of 2 * 4 * 6 * .. * n # = (2 * 2 * 2 * .. * 2) * (1 * 2 * 3 * .. * exp) # = 2**exp * factorial(exp) where exp = floor(n/2) exp = n >> 1 factorial_recurse(exp) @factors[2] += exp # collect prime factors 3 * 5 * 7 * ... * n for prime in @odd_primes break if prime > n exp = 0 # count occurences of prime, prime**2, prime**3, .. n prime_power = prime until prime_power > n # floor(n / prime_power) occurences in 1 * 2 * .. * n, # but only ceil(count / 2) occurences in 3 * 5 * .. * n @factors[prime] += (n / prime_power + 1) >> 1 prime_power *= prime end end end # usage: factorial.rb [-ct] [number] cflag = tflag = false OptionParser.new {|opts| opts.on('-c', 'Check for bugs') { cflag = true } opts.on('-t', 'Use trivial algorithm') { tflag = true } opts.parse! }$*[1] and fail 'too many arguments'
n = Integer(\$*[0] || 176_000)

if cflag
factorial(n) == (1..n).reduce(1, :*) or
fail "bad program: factorial(#{n}) is wrong"
puts "ok"
exit
end

# measure processor time to calculate factorial
f = nil
if tflag
time = Benchmark.measure { f = (1..n).reduce(1, :*) }
else
time = Benchmark.measure { f = factorial(n) }
end
puts f
puts "Time #{time.total} sec"

0

# Julia - Score = 15.194

Utilising the exact same approach as that of the reference program... that is,

f(n)=reduce(*,1:big(n))

So it uses reduce, the basic binary multiplication operation, and a range (in this case, using big(n) to force the calculation to be done in BigInt rather than Int64). From this, I get

julia> @time K=f(2340000);
elapsed time: 9.991324093 seconds (814552840 bytes allocated)

On my computer, with reference program running with input of 154000, I get Time: 10.041181 sec output (run using python ./test.py, where test.py is the name of the file containing the reference code)