Write the fastest Fibonacci

10

3

This is yet another challenge about the Fibonacci numbers.

The goal is to compute the 20'000'000th Fibonacii number as fast as possible. The decimal output is about 4 MiB large; it starts with:

28543982899108793710435526490684533031144309848579

The MD5 sum of the output is

fa831ff5dd57a830792d8ded4c24c2cb

You have to submit a program that calculates the number while running and puts the result to stdout. The fastest program, as measured on my own machine, wins.

Here are some additional rules:

  • You have to submit the source code and a binary runnable on an x64 Linux
  • The source code must be shorter than 1 MiB, in case of assembly it is also acceptable if only the binary is that small.
  • You must not include the number to be computed in your binary, even in a disguised fashion. The number has to be calculated at runtime.
  • My computer has two cores; you are allowed to use parallelism

I took a small implementation from the Internet which runs in about 4.5 seconds. It should not be very difficult to beat this, assuming that you have a good algorithm.

FUZxxl

Posted 2011-07-16T16:51:42.787

Reputation: 9 656

Related: first 1000 digits of Fib(10**9). Some fast implementations, including one using a Fib(2n) and 2n+1 in terms of Fib(n),n+1 recurrence that runs near-instantly. Also including my x86 machine code answer, featuring lots of brute force, no clever math :P.

– Peter Cordes – 2017-07-25T09:55:59.240

If you have high float precision, it's just Fibo(x) = (phi^x)/sqrt(5) an O(log(n)) operation – JBernardo – 2011-07-16T17:39:15.553

@JBernardo If you think this can beat 4.5 sec, please implement it like that for an x64 Linux. I'd live to see your code. – FUZxxl – 2011-07-16T17:52:05.210

The operation is quite simple. It's all about precision – JBernardo – 2011-07-16T18:05:23.370

@Jbernardo The spec says that you have to output the exact number. (Though I won't check every single digit, just a hash). If you think you can code it like that, just do so. But I doubt, that that will be faster than 4.5 secs. – FUZxxl – 2011-07-16T18:11:23.397

1Dude, anything like Sage that has indeterminate float precision will run that thing in less thant 1/10th of second. It's just a simple expression as phi = (1+sqrt(5))/2 – JBernardo – 2011-07-16T18:14:23.473

@FUZxxl: hash collision FTW! :D You can use diff to be 100% certain. – boothby – 2011-07-16T18:53:41.710

@Bothby I don't know how mlong it takes for you to get a collision for this hash, but IMHP it takes longer than 4.5 secs. – FUZxxl – 2011-07-16T19:19:11.560

4Can we output the number in hex? – Keith Randall – 2011-07-18T00:25:12.817

2@Keith Nope. That's part of the spec. – FUZxxl – 2011-07-18T10:16:42.377

3Since it's to be measured on your CPU, we might as well have some more information about it, couldn't we? Intel or AMD? Size of the L1 and instruction caches? Instruction set extensions? – J B – 2011-07-19T13:37:49.393

@J B My cpuinfo

– FUZxxl – 2011-07-19T14:43:36.897

2As I compute it, your start string and MD5 are for the 20'000'000th number, not the mere 2'000'000th. – J B – 2011-07-19T17:07:30.473

@J B, I just noticed that, too -- and FUZxxl's solution is obviously computing the 20,000,000th. – boothby – 2011-07-19T17:13:10.020

@boothby oops.... I should really think longer before postin. – FUZxxl – 2011-07-19T17:19:46.243

1@JBernardo, I implemented an arbitrary-precision solution, and it gets utterly killed by my double-and-add solution. That, in turn, is embarrassingly slow compared to Sage. I looked into the Sage source; it calls Pari. I looked at the Pari source... and the algorithm is insane. I patently refuse to re-implement it and take credit for it. – boothby – 2011-07-20T00:06:32.997

@boothby link to the algo in the pari source? – Sparr – 2014-11-20T04:54:02.727

Answers

4

C with GMP, 3.6s

Gods, but GMP makes code ugly. With a Karatsuba-style trick, I managed to cut down to 2 multiplies per doubling step. Now that I'm reading FUZxxl's solution, I'm not the first to have the idea. I've got a couple more tricks up my sleeve... maybe I'll try 'em out later on.

#include <gmp.h>
#include <stdio.h>

#define DBL mpz_mul_2exp(u,a,1);mpz_mul_2exp(v,b,1);mpz_add(u,u,b);mpz_sub(v,a,v);mpz_mul(b,u,b);mpz_mul(a,v,a);mpz_add(a,b,a);
#define ADD mpz_add(a,a,b);mpz_swap(a,b);

int main(){
    mpz_t a,b,u,v;
    mpz_init(a);mpz_set_ui(a,0);
    mpz_init(b);mpz_set_ui(b,1);
    mpz_init(u);
    mpz_init(v);

    DBL
    DBL
    DBL ADD
    DBL ADD
    DBL
    DBL
    DBL
    DBL ADD
    DBL
    DBL
    DBL ADD
    DBL
    DBL ADD
    DBL ADD
    DBL
    DBL ADD
    DBL
    DBL
    DBL
    DBL
    DBL
    DBL
    DBL
    DBL /*Comment this line out for F(10M)*/

    mpz_out_str(stdout,10,b);
    printf("\n");
}

Built with gcc -O3 m.c -o m -lgmp.

boothby

Posted 2011-07-16T16:51:42.787

Reputation: 9 038

LOL. Apart from a identifier naming, that's exactly my solution :) – J B – 2011-07-19T20:37:35.207

@J B: FIRST! :D – boothby – 2011-07-19T20:42:08.980

Keep it ;) The next trick up my sleeve will benefit from Haskell more than from C. – J B – 2011-07-19T20:46:13.567

First trick up my sleeve bumped in a GHC bug. Drat. I'll have to fall back to the second one, which isn't remotely as fun to implement, so it'll take time and motivation. – J B – 2011-07-19T22:10:05.987

3.6 secs on my machine. – FUZxxl – 2011-07-19T22:11:14.427

11

Sage

Hmm, you seem to assume that the fastest is going to be a compiled program. No binary for you!

print fibonacci(2000000)

On my machine, it takes 0.10 cpu seconds, 0.15 wall seconds.

edit: timed on the console, instead of the notebook

boothby

Posted 2011-07-16T16:51:42.787

Reputation: 9 038

1My idea was not to know, how fast your CAS can do this, but rather how fast you can code this by yourself. – FUZxxl – 2011-07-19T14:55:45.243

11For the record, I just put this up to be a smartass; you didn't say not to use builtins. – boothby – 2011-07-19T20:19:48.367

5

Haskell

This is my own try, although I did not wrote the algorithm by myself. I rather copied it from haskell.org and adapted it to use Data.Vector with its famous stream fusion:

import Data.Vector as V
import Data.Bits

main :: IO ()
main = print $ fib 20000000

fib :: Int -> Integer
fib n = snd . V.foldl' fib' (1,0) . V.dropWhile not $ V.map (testBit n) $ V.enumFromStepN (s-1) (-1) s
    where
        s = bitSize n
        fib' (f,g) p
            | p         = (f*(f+2*g),ss)
            | otherwise = (ss,g*(2*f-g))
            where ss = f*f+g*g

This takes around 4.5 seconds when compiled with GHC 7.0.3 and the following flags:

ghc -O3 -fllvm fib.hs

FUZxxl

Posted 2011-07-16T16:51:42.787

Reputation: 9 656

Weird... I needed to change 20000000 to 40000000 to get it to print the expected number. – J B – 2011-07-19T19:47:21.503

Gotcha. Should be enumFromStepN (s-1) instead of enumFromStepN s – J B – 2011-07-19T19:53:06.960

@J B Sorry for all this confusion. I initially tested the program with different values to get a reasonably big number and saved the output into different files. But some how I confused them. I have updated the number to match the desired result. – FUZxxl – 2011-07-19T19:53:14.600

@boothby No, I did not change the desired fibonacci number, but rather the reference output, which was wrong. – FUZxxl – 2011-07-19T20:07:04.313

Side note: it's about 1.5s on my machine, but neither LLVM not Data.Vector seem to bring any significant advantage. – J B – 2011-07-19T20:44:35.440

4

COW

 MoO moO MoO mOo MOO OOM MMM moO moO
 MMM mOo mOo moO MMM mOo MMM moO moO
 MOO MOo mOo MoO moO moo mOo mOo moo

Moo! (Takes a while. Drink some milk...)

Timtech

Posted 2011-07-16T16:51:42.787

Reputation: 12 038

1Note: Although this really does work, it will probably never reach 20,000,000... – Timtech – 2014-01-13T22:19:34.273

2

Mathematica, interpreted:

First@Timing[Fibonacci[2 10^6]]

Timed:

0.032 secs on my poor man's laptop.

And of course, no binary.

Dr. belisarius

Posted 2011-07-16T16:51:42.787

Reputation: 5 345

Doesn't print to stdout. – boothby – 2011-07-18T19:36:10.263

@boothby Wrong. It writes to standard output if you use the command line interface. See for example http://stackoverflow.com/questions/6542537/batch-input-and-output-in-mathematica/6542653#6542653

– Dr. belisarius – 2011-07-18T19:44:45.863

Nope, I'm using the commandline interface, version 6.0. Even using -batchoutput, it only prints timing info and not the Fibonacci number. – boothby – 2011-07-18T20:36:59.930

Sorry, can't reproduce since I don't have mathematica. – FUZxxl – 2011-07-19T14:49:00.690

5curl 'http://www.wolframalpha.com/input/?i=Fibonacci%5B2+10^6%5D' | grep 'Decimal approximation:' | sed ... It runs in constant time with respect to the speed of your Internet connection. ;-) – ESultanik – 2011-07-25T17:06:03.600

@belisarius what a lame solution. Why don't you code it yourself? – Arlen – 2011-08-05T03:39:14.097

CLI != executable binary – Ingo Bürk – 2014-11-20T06:48:21.363

2

Ocaml, 0.856s on my laptop

Requires the zarith library. I used Big_int but it's dog slow compared to zarith. It took 10 minutes with the same code! Most of the time was spent printing the damn number (9½ minutes or so)!

module M = Map.Make
  (struct
    type t = int
    let compare = compare
   end)

let double b = Z.shift_left b 1
let ( +. ) b1 b2 = Z.add b1 b2
let ( *. ) b1 b2 = Z.mul b1 b2

let cache = ref M.empty 
let rec fib_log n =
  if n = 0
  then Z.zero
  else if n = 1
  then Z.one
  else if n mod 2 = 0
  then
    let f_n_half = fib_log_cached (n/2)
    and f_n_half_minus_one = fib_log_cached (n/2-1)
    in f_n_half *. (f_n_half +. double f_n_half_minus_one)
  else
    let f_n_half = fib_log_cached (n/2)
    and f_n_half_plus_one = fib_log_cached (n/2+1)
    in (f_n_half *. f_n_half) +.
    (f_n_half_plus_one *. f_n_half_plus_one)
and fib_log_cached n =
    try M.find n !cache
    with Not_found ->
      let res = fib_log n
      in cache := M.add n res !cache;
      res

let () =
  let res = fib_log 20_000_000 in
  Z.print res; print_newline ()

I can't believe how much a difference the library made!

ReyCharles

Posted 2011-07-16T16:51:42.787

Reputation: 525

1For comparison @boothby's solution takes 0.875s to run on my laptop. It seems the difference is neglible. Also, apparently my laptop is fast :o – ReyCharles – 2012-10-12T13:28:56.680

1

Haskell

On my system, this runs almost as fast as FUZxxl's answer (~18 seconds instead of ~17 seconds).

main = print $ fst $ fib2 20000000

-- | fib2: Compute (fib n, fib (n+1)).
--
-- Having two adjacent Fibonacci numbers lets us
-- traverse up or down the series efficiently.
fib2 :: Int -> (Integer, Integer)

-- Guard against negative n.
fib2 n | n < 0 = error "fib2: negative index"

-- Start with a few base cases.
fib2 0 = (0, 1)
fib2 1 = (1, 1)
fib2 2 = (1, 2)
fib2 3 = (2, 3)

-- For larger numbers, derive fib2 n from fib2 (n `div` 2)
-- This takes advantage of the following identity:
--
--    fib(n) = fib(k)*fib(n-k-1) + fib(k+1)*fib(n-k)
--             where n > k
--               and k ≥ 0.
--
fib2 n =
    let (a, b) = fib2 (n `div` 2)
     in if even n
        then ((b-a)*a + a*b, a*a + b*b)
        else (a*a + b*b, a*b + b*(a+b))

Joey Adams

Posted 2011-07-16T16:51:42.787

Reputation: 9 929

I ran this in ghci. I was pretty impressed. Haskell is great for these types of mathematical code problems. – Undreren – 2012-10-12T08:03:37.107

Nice. I love Haskell. – Arlen – 2011-08-05T03:39:57.030

1

Java: 8 seconds to compute, 18 seconds to write

public static BigInteger fibonacci1(int n) {
    if (n < 0) explode("non-negative please");
    short charPos = 32;
    boolean[] buf = new boolean[32];
    do {
        buf[--charPos] = (n & 1) == 1;
        n >>>= 1;
    } while (n != 0);
    BigInteger a = BigInteger.ZERO;
    BigInteger b = BigInteger.ONE;
    BigInteger temp;
    do {
        if (buf[charPos++]) {
            temp = b.multiply(b).add(a.multiply(a));
            b = b.multiply(a.shiftLeft(1).add(b));
            a = temp;
        } else {
            temp = b.multiply(b).add(a.multiply(a));
            a = a.multiply(b.shiftLeft(1).subtract(a));
            b = temp;
        }
    } while (charPos < 32);
    return a;
}

public static void main(String[] args) {
    BigInteger f;
    f = fibonacci1(20000000);
    // about 8 seconds
    System.out.println(f.toString());
    // about 18 seconds
}

averykhoo

Posted 2011-07-16T16:51:42.787

Reputation: 101

1

C, naive algorithm

Was curious, and I hadn't used gmp before... so:

#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>

int main(int argc, char *argv[]){
    int n = (argc>1)?atoi(argv[1]):0;

    mpz_t temp,prev,result;
    mpz_init(temp);
    mpz_init_set_ui(prev, 0);
    mpz_init_set_ui(result, 1);

    for(int i = 2; i <= n; i++) {
        mpz_add(temp, result, prev);
        mpz_swap(temp, result);
        mpz_swap(temp, prev);
    }

    printf("fib(%d) = %s\n", n, mpz_get_str (NULL, 10, result));

    return 0;
}

fib(1 million) takes about 7secs... so this algorithm won't win the race.

baby-rabbit

Posted 2011-07-16T16:51:42.787

Reputation: 1 623

1

I implemented the matrix multiplication method (from sicp, http://sicp.org.ua/sicp/Exercise1-19) in SBCL but it takes about 30 seconds to finish. I ported it to C using GMP, and it returns the correct result in about 1.36 seconds on my machine. It's about as fast as boothby's answer.

#include <gmp.h>
#include <stdio.h>

int main()
{
  int n = 20000000;

  mpz_t a, b, p, q, psq, qsq, twopq, bq, aq, ap, bp;
  int count = n;

  mpz_init_set_si(a, 1);
  mpz_init_set_si(b, 0);
  mpz_init_set_si(p, 0);
  mpz_init_set_si(q, 1);
  mpz_init(psq);
  mpz_init(qsq);
  mpz_init(twopq);
  mpz_init(bq);
  mpz_init(aq);
  mpz_init(ap);
  mpz_init(bp);

  while(count > 0)
    {
      if ((count % 2) == 0)
        {
          mpz_mul(psq, p, p);
          mpz_mul(qsq, q, q);
          mpz_mul(twopq, p, q);
          mpz_mul_si(twopq, twopq, 2);

          mpz_add(p, psq, qsq);    // p -> (p * p) + (q * q)
          mpz_add(q, twopq, qsq);  // q -> (2 * p * q) + (q * q) 
          count/=2;
        }

      else
       {
          mpz_mul(bq, b, q);
          mpz_mul(aq, a, q);
          mpz_mul(ap, a, p);
          mpz_mul(bp, b, p);

          mpz_add(a, bq, aq);      // a -> (b * q) + (a * q)
          mpz_add(a, a, ap);       //              + (a * p)

          mpz_add(b, bp, aq);      // b -> (b * p) + (a * q)

          count--;
       }

    }

  gmp_printf("%Zd\n", b);
  return 0;
}

Erik Haliewicz

Posted 2011-07-16T16:51:42.787

Reputation: 401

0

Go

It's embarrasingly slow. On my computer it takes a little less than 3 minutes. It's only 120 recursive calls, though (after adding the cache). Note that this may use a lot of memory (like 1.4 GiB)!

package main

import (
    "math/big"
    "fmt"
)

var cache = make(map[int64] *big.Int)

func fib_log_cache(n int64) *big.Int {
    if res, ok := cache[n]; ok {
        return res
    }
    res := fib_log(n)
    cache[n] = res
    return res
}

func fib_log(n int64) *big.Int {
    if n <= 1 {
        return big.NewInt(n)
    }

    if n % 2 == 0 {
        f_n_half := fib_log_cache(n/2)
        f_n_half_minus_one := fib_log_cache(n/2-1)
        res := new(big.Int).Lsh(f_n_half_minus_one, 1)
        res.Add(f_n_half, res)
        res.Mul(f_n_half, res)
        return res
    }
    f_n_half := fib_log_cache(n/2)
    f_n_half_plus_one := fib_log_cache(n/2+1)
    res := new(big.Int).Mul(f_n_half_plus_one, f_n_half_plus_one)
    tmp := new(big.Int).Mul(f_n_half, f_n_half)
    res.Add(res, tmp)
    return res
}

func main() {
    fmt.Println(fib_log(20000000))
}

ReyCharles

Posted 2011-07-16T16:51:42.787

Reputation: 525

I tried parallelizing it (before adding the cache) using go routines and it started using 19 GiB of memory :/ – ReyCharles – 2012-10-12T10:32:38.127

-4

pseudo code (I don't know what you guys are using)

product = 1
multiplier = 3 // 3 is fibonacci sequence, but this can be any number, 
      // generating an infinite amount of sequences
y = 28 // the 2^x-1 term, so 2^28-1=1,284,455,535th term
for (int i = 1; int < y; i++) {
  product= sum*multiplier-1
  multiplier= multiplier^2-2
}
multiplier=multiplier-product // 2^28+1 1,284,455,537th 

It took my computer 56 hours to do those two terms. My computer is kind of crappy. I'll have the number in a text file on October 22nd. 1.2 gigs is a bit big to be sharing on my connection.

Thomas Olson

Posted 2011-07-16T16:51:42.787

Reputation: 1

1I'm confused by your answer. Pseudocode? And yet you have timings? Post the code! Language doesn't matter! – boothby – 2012-10-12T05:06:38.327

That, and the output is only supposed to be 4 million or so digits... – Wug – 2012-10-12T21:12:09.323