Fastest code to find the next prime

17

11

The problem is as follows.

Input: An integer n

Output: The smallest prime bigger than n.

The challenge is to give the fastest code possible to do this. I will test the code on values starting at size roughly 10^8 10^200 and doubling in size until it takes more than one minute 10 seconds on my computer.

The winning code will find the next prime for the largest input size.

By way of comparison, a simple sieve written in python is able to find the next prime bigger than 10^8 in about 20 seconds.

The requirement that I can test it on my 4GB RAM ubuntu computer is strict. All code must be free (in both senses) and if it uses libraries they must also be free and easily installable. Any false primes reported will immediately disqualify the submission.

I will award separate commendations for the winners in each programming language too if the code is entirely written in that language without the use of external libraries. I will also keep a running table of the fastest times as the competition goes on so people can see how they are doing.

Table so far

  • Python. An astonishing 357 digit prime 343239883006530485749095039954069660863471765007165270469723172959277159169882802606127982033072727748864815569574042901856099399985832190628701414555752857600000000000000000000000000000000000000002872284792758930912601189043411951050852357613658978971208596097634095500808832510259693761982135208603287199546795000697807728609476163156438356035166156820611 was the final number under 10 seconds using the code supplied by primo. Will anyone beat this first entry?

felipa

Posted 2013-02-16T10:34:02.897

Reputation: 895

1I think the OP has failed to update his tests of the answers... – mbomb007 – 2016-02-26T22:49:46.403

1

Almost an exact duplicate of Code-Challenge:The Nearest Prime

– Peter Taylor – 2013-02-16T11:00:52.773

@PeterTaylor That question is about time complexity I think. This is about practical speed in seconds. I think those two things can be quite different. – felipa – 2013-02-16T11:03:47.390

Sure, if you stick to small test cases. But since no-one bothered to implement AKS for the other question, you're going to get the same answers. – Peter Taylor – 2013-02-16T11:12:30.570

@PeterTaylor There is already an amazing new answer here. I will have to test with huge test cases now! – felipa – 2013-02-16T11:25:35.667

For some value of new. He does say that it was an answer from another question: I think the space of questions about primes is pretty much mined out now. – Peter Taylor – 2013-02-18T10:13:41.470

@PeterTaylor You are right. I suppose what will be new here is the comparison of how large a prime you can find it practice. – felipa – 2013-02-18T14:51:46.320

3

@PeterTaylor allow me to disagree. Eventually, 90% of a site's traffic should come from search engines. A google search for fast semiprime factorization and Multiple Polynomial Quadratic Sieve return the original problem I've taken my code from at place #2 and #4 respectively. I imagine at some point, this problem will rank fairly high for fast next prime function as well.

– primo – 2013-02-18T15:04:22.890

@primo, that's the generic text, and not really applicable to codegolf. This site isn't a question-and-answer site in the general style of stackexchange. Having said that, I'm not sure quite what you're disagreeing with; if you want to start a discussion on meta about the line between duplicate and not, I'll chip in. – Peter Taylor – 2013-02-19T13:40:42.127

@PeterTaylor what I'm disagreeing with is position that a new question should be closed on the sole condition that a similar question has already been asked. Particularly if the 'duplicate' question has no accepted answer, and is already several years old. – primo – 2013-02-20T07:10:34.860

It might be fun to compare the results with math software. For instance, in Mathematica, use NextPrime[n] to find the next prime after n. On my laptop, NextPrime[10^1000] took around 1 second to compute. – M.S. Dousti – 2014-08-27T23:32:12.747

Answers

21

Python ~451 digits

This is part of the library I wrote for a semiprime factorization problem, with unnecessary functions removed. It uses the Baillie-PSW primality test, which is technically a probabilistic test, but to date, there are no known pseudoprimes – and there's even a cash reward if you're able to find one (or for supplying a proof that none exist).

Edit: I hadn't realized that Python has built-in modular exponentiation. Replacing my own for the built-in results in a performance boost of about 33%.

my_math.py

# legendre symbol (a|m)
# note: returns m-1 if a is a non-residue, instead of -1
def legendre(a, m):
  return pow(a, (m-1) >> 1, m)

# strong probable prime
def is_sprp(n, b=2):
  d = n-1
  s = 0
  while d&1 == 0:
    s += 1
    d >>= 1

  x = pow(b, d, n)
  if x == 1 or x == n-1:
    return True

  for r in range(1, s):
    x = (x * x)%n
    if x == 1:
      return False
    elif x == n-1:
      return True

  return False

# lucas probable prime
# assumes D = 1 (mod 4), (D|n) = -1
def is_lucas_prp(n, D):
  P = 1
  Q = (1-D) >> 2

  # n+1 = 2**r*s where s is odd
  s = n+1
  r = 0
  while s&1 == 0:
    r += 1
    s >>= 1

  # calculate the bit reversal of (odd) s
  # e.g. 19 (10011) <=> 25 (11001)
  t = 0
  while s > 0:
    if s&1:
      t += 1
      s -= 1
    else:
      t <<= 1
      s >>= 1

  # use the same bit reversal process to calculate the sth Lucas number
  # keep track of q = Q**n as we go
  U = 0
  V = 2
  q = 1
  # mod_inv(2, n)
  inv_2 = (n+1) >> 1
  while t > 0:
    if t&1 == 1:
      # U, V of n+1
      U, V = ((U + V) * inv_2)%n, ((D*U + V) * inv_2)%n
      q = (q * Q)%n
      t -= 1
    else:
      # U, V of n*2
      U, V = (U * V)%n, (V * V - 2 * q)%n
      q = (q * q)%n
      t >>= 1

  # double s until we have the 2**r*sth Lucas number
  while r > 0:
      U, V = (U * V)%n, (V * V - 2 * q)%n
      q = (q * q)%n
      r -= 1

  # primality check
  # if n is prime, n divides the n+1st Lucas number, given the assumptions
  return U == 0

# primes less than 212
small_primes = set([
    2,  3,  5,  7, 11, 13, 17, 19, 23, 29,
   31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
   73, 79, 83, 89, 97,101,103,107,109,113,
  127,131,137,139,149,151,157,163,167,173,
  179,181,191,193,197,199,211])

# pre-calced sieve of eratosthenes for n = 2, 3, 5, 7
indices = [
    1, 11, 13, 17, 19, 23, 29, 31, 37, 41,
   43, 47, 53, 59, 61, 67, 71, 73, 79, 83,
   89, 97,101,103,107,109,113,121,127,131,
  137,139,143,149,151,157,163,167,169,173,
  179,181,187,191,193,197,199,209]

# distances between sieve values
offsets = [
  10, 2, 4, 2, 4, 6, 2, 6, 4, 2, 4, 6,
   6, 2, 6, 4, 2, 6, 4, 6, 8, 4, 2, 4,
   2, 4, 8, 6, 4, 6, 2, 4, 6, 2, 6, 6,
   4, 2, 4, 6, 2, 6, 4, 2, 4, 2,10, 2]

max_int = 2147483647

# an 'almost certain' primality check
def is_prime(n):
  if n < 212:
    return n in small_primes

  for p in small_primes:
    if n%p == 0:
      return False

  # if n is a 32-bit integer, perform full trial division
  if n <= max_int:
    i = 211
    while i*i < n:
      for o in offsets:
        i += o
        if n%i == 0:
          return False
    return True

  # Baillie-PSW
  # this is technically a probabalistic test, but there are no known pseudoprimes
  if not is_sprp(n): return False
  a = 5
  s = 2
  while legendre(a, n) != n-1:
    s = -s
    a = s-a
  return is_lucas_prp(n, a)

# next prime strictly larger than n
def next_prime(n):
  if n < 2:
    return 2
  # first odd larger than n
  n = (n + 1) | 1
  if n < 212:
    while True:
      if n in small_primes:
        return n
      n += 2

  # find our position in the sieve rotation via binary search
  x = int(n%210)
  s = 0
  e = 47
  m = 24
  while m != e:
    if indices[m] < x:
      s = m
      m = (s + e + 1) >> 1
    else:
      e = m
      m = (s + e) >> 1

  i = int(n + (indices[m] - x))
  # adjust offsets
  offs = offsets[m:]+offsets[:m]
  while True:
    for o in offs:
      if is_prime(i):
        return i
      i += o

A sample test script:

from time import clock
from my_math import *

n = i = 317**79
while True:
  i *= 317
  time1 = clock()
  n, o = next_prime(i), n
  span = clock()-time1
  if span > 10:
    break
  print(len(str(n)), span)
print(o)

A factor of 317 was chosen, because it's approximately the square root of 10000, adding roughly 2.5 digits per iteration (and because doubling was too slow to sit through). Output shows the current number of digits, and the time taken.

Sample results:

201 0.13121248650317288
203 0.059535499623555505
206 0.9157767258129175
208 0.2583420518529589
211 0.15367400046653978
213 0.32343915218274955
216 1.3962866788935466
218 0.5986165839513125
221 0.973842206202185
223 2.346910291671148
...
428 0.932809896229827
431 4.345940056627313
433 9.511724255457068
436 6.089835998709333
438 1.3793498894412721
441 4.290633027381972
443 3.5102506044762833
446 3.1629148397352083
448 3.364759208223404
451 7.34668009481652
1551197868099891386459896063244381932060770425565921999885096817830297496627504652115239001983985153119775350914638552307445919773021758654815641382344720913548160379485681746575245251059529720935264144339378936233043585239478807971817857394193701584822359805681429741446927344534491412763713568490429195862973508863067230162660278070962484418979417980291904500349345162151774412157280412235743457342694749679453616265540134456421369622519723266737913

All code is now python 3 compatible.

primo

Posted 2013-02-16T10:34:02.897

Reputation: 30 891

That is astonishingly fast! I will run it properly with doubling size in a few days (and a deterministic primality test) and put the biggest number in the table. I suspect you may already be the winner though. – felipa – 2013-02-16T11:08:43.760

1FWIW, in Sage, next_prime((2^520)*(10^200)) about 15 seconds on my machine, so at first blush this is quite impressive. However... next_prime((2^520)*(10^200),proof=False) takes 0.4 seconds because it only checks for pseudoprimality. Your claim "there are no known pseudoprimes" is vanishingly convincing as the number of bits goes over 64. For 357 digits, I'm not even remotely convinced by a lack of counterexamples. – boothby – 2013-02-19T03:36:56.840

@boothby it's worth noting that this is the very same method used by Maple. That the method was published 33 years ago, and there still no known pseudoprimes speaks for its degree of accuracy.

– primo – 2013-02-19T13:09:16.437

1This is why I use Sage. "Not known to fail" is really actually not the same as "known to work". Suppose there was one false pseudoprime under 400 digits. It would take trillions of years to find it -- but it'd still be there, foiling any attempt to prove 'pseudoprime = prime'. I will always downvote "solutions" that use probabalistic methods with zero guarantee. Monte Carlo? Sure thing. "It's prime 'cause a wizard told me it probably was"? Nope. – boothby – 2013-02-19T17:10:21.110

@boothby Is sage running AKS to check? – felipa – 2013-02-19T20:50:07.883

@felipa Where pseudoprimality is known sufficient, Sage uses PARI's nextprime -- otherwise, it repeatedly increments and uses PARI's isprime. The ultimate fallback there is isprimeAPRCL. AKS is not competitive until ln(ln(ln(n))) > 12... about 10^(70k) digits. AKS-PL is competitive around a googol digits. That is... never in our lifetimes. – boothby – 2013-02-19T22:23:30.257

@boothby What is isprimeAPRCL implementing? And do you mean 12 for AKS? I thought they got it down to 6 or is that something else? – felipa – 2013-02-20T15:53:06.630

@boothby I am guessing http://en.wikipedia.org/wiki/Adleman%E2%80%93Pomerance%E2%80%93Rumely_primality_test .

– felipa – 2013-02-20T17:54:03.967

@felipa, 12 is AKS proper, AKS-PL brings it down to 6 (whence the googol digits). Correct on APRCL. Let's stop commenting on Primo's answer, shall we? This is getting rather long. ;) – boothby – 2013-02-20T18:09:30.863

1@boothby You need to add an answer so we can comment under it :) – felipa – 2013-02-20T19:17:42.487

6

C++ with GMP: 567 digits

Uses the Miller-Rabin implementation in GMP. It might return a false positive, but good luck actually hitting one with probability 2^-200.

#include <gmp.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>

double time() {
  struct timeval t;
  gettimeofday(&t, NULL);
  return t.tv_usec  * 1e-6 + t.tv_sec;
}

int main(int argc, char *argv[]) {
  mpz_t n, m;
  mpz_init_set_ui(n, 10);
  mpz_pow_ui(n, n, 200);
  mpz_init(m);
  for (int i = 0; true; i++, mpz_mul_ui(n, n, 2)) {
    double start = time();
    for (mpz_add_ui(m, n, 1); !mpz_millerrabin(m, 100); mpz_add_ui(m, m, 2)) ;
    double t = time() - start;
    gmp_printf("%d %Zd %f\n", i, m, t);
    if (t > 10.0) break;
  }
}

Finds the prime 10^200 * 2^1216 + 361 (567 digits) before running over time on my slow laptop.

Keith Randall

Posted 2013-02-16T10:34:02.897

Reputation: 19 865

3

Perl with GMP module, 1300 digits

Using my module Math::Prime::Util and its GMP back end. The exact crossover point will depend on your computer and whether you have the latest GMP library. All code is free (the modules are on github and CPAN, and GMP is freely available). I've run them on AWS's Ubuntu as well as a desktop Ubuntu (and Fedora, and AIX, and NetBSD, etc.).

The core code is in C and C+GMP. next_prime from MPU sees the number is too large and forwards it to the GMP back end (or pure Perl code if the back end isn't installed). That stringiifies and converts to a mpz, and will convert the result back into the input object type or Math::BigInt. next_prime itself does:

  • a mod 30 wheel
  • keeps track of the remainder mod 23# so it can do native modulos for primes up to 23
  • probable prime test on things that pass these.

The probable prime test is:

  • check for tiny divisors using mpz_gcd_ui (in 64-bit two of these check up to 101)
  • check for small divisors using singly-calculated large primorials. This is either primes up to 10k or 40k depending on the input size.
  • for values larger than 2^1600, performs further trial division using a treesieve. This could be done more efficiently.
  • finally, ES BPSW is done (Miller-Rabin test with base 2 followed by an extra strong Lucas test).

Everything prior to the ES BPSW is just optimization, which of course we want for next_prime. next_prime is also implemented in Perl using the Math::BigInt module (in core with optional Pari and GMP back ends). That does AES BPSW (like Pari) but isn't as optimized.

I have thought about the merits of a partial-sieve-based version, using a range of, for instance, 2 merits. I'm just not sure if this would really be better, as most of the time we'd be doing unnecessary sieving as the gap was small, and sometimes for a large gap we'd have to repeat it multiple times.

The library implements ECPP (including certificates) so we could run a proof on the result, but 1200 digits is really too large for the tiny default set of included polynomials (there is a method for downloading larger sets -- proofs would take a bit under 15 minutes which is a bit faster than Pari's APR-CL but a bit slower than WraithX's mpz_aprcl). One downside of ECPP vs. APR-CL is that it has more time variance so there is a good chance of it exceeding 10 seconds on some number before the average time gets there. With a proof I think we're limited to something in the 400 digit range unless we allow multi-threaded software.

#!/usr/bin/env perl
use warnings;
use strict;
use Math::Prime::Util ":all";
use Math::Prime::Util::GMP;  # Barf if the backend isn't installed
use Time::HiRes qw(gettimeofday tv_interval);
use Math::GMP;

my $n = Math::GMP->new(10) ** 200;
while (1) {
  my $start = [gettimeofday];
  my $np = next_prime($n);
  my $sec = tv_interval($start);
  my $len = length($n);
  die "next_prime $len = +",$np-$n," in $sec seconds\n" if $sec > 10;
  warn "  next_prime $len = +",$np-$n," in $sec seconds\n";
  $n *= 10;
}

I decided to try with the same sequence used by primo. It got to 1191 digits, as that is where we hit a gap of 18138. I also tested primo's code using the latest my_math.py. It gets to 630 digits with the 10^e sequence and 641 with his sequence. Very impressive for compact all-Python code without lots of pretests.

DanaJ

Posted 2013-02-16T10:34:02.897

Reputation: 466

I still can't get over how fast this module is. It's resparked my interest in perl as a number-crunching tool. I'm currently rewriting Math::GMP in a way that isn't quite so wasteful with mpz reference creation/destruction. – primo – 2014-08-08T05:02:02.510

The real work is all in C+GMP, so it could all work in Python as well. Python has some serious advantages over Perl 5 for big numbers, that I wish could be solved. Math::GMPz, by the way, is faster than Math::GMP and has basically the entire mpz API exposed, though more brittle and a bit weird to call sometimes. Fixing some things in Math::GMP is on my todo list behind too many other things.

Re MPU, I've thought about inverting the development and make it into two C libraries, then have the Perl module just use that. It would help get it used elsewhere. – DanaJ – 2014-08-08T15:26:48.687

I'm making good progress. The following loop runs over 10 times as fast, solely due to better reference management: $x = new Math::GMP(0); $x += 3 for 1..1000000. I'll post to cpan when I'm finished; you'll be one of the first to know ;) – primo – 2014-08-08T18:32:08.550