Find largest prime which is still a prime after digit deletion

Over at https://math.stackexchange.com/questions/33094/deleting-any-digit-yields-a-prime-is-there-a-name-for-this the following question is asked. How many primes are there that remain prime after you delete any one of its digits? For example 719 is such a prime as you get 71, 19 and 79. While this question is unresolved I thought it make a nice coding challenge.

Task. Give the largest prime you can find that remains a prime after you delete any one of its digits. You should also provide the code that finds it.

Score. The value of the prime you give.

You can use any programming language and libraries you like as long as they are free.

To start things off, 99444901133 is the largest given on the linked page.

Time limit. I will accept the largest correct answer given exactly one week after the first correct answer larger than 99444901133 is given in an answer.

Scores so far.

Python (primo)

4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111

J (randomra) (This answer started the one week timer on 21st Feb 2013.)

222223333333

motl7

Posted 2013-02-21T14:27:06.763

Reputation: 191

9901444133 (a deletion of one 9) isn't prime (7 x 1414492019). Your previous example was correct, though. – primo – 2013-02-21T17:43:54.323

@primo Thanks, fixed. That was an odd typo of mine. – motl7 – 2013-02-21T17:49:30.060

1If there is a largest one - as the analysis seems to indicate, I wonder how you could go about a proof when you think you've found it. – gnibbler – 2013-02-22T22:47:18.330

1What about other bases? In base 2, I could not find anything higher than 11 (2r1011), 11 also in base 3 (3r102), 262151 in base 4 (4r1000000013), 17 in base 5 (5r32), 37 in base 7 (7r52), 47 in base 9 (9r52). – aka.nice – 2013-04-20T15:15:02.613

Answers

274 digits

4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111

This took about 20 hours of CPU time to find, and about 2 minutes per prime to prove. In contrast, the 84 digit solution can be found in around 3 minutes.

84 digits

444444444444444444444444444444444444444444444444441111111113333333333333333333333333

77777777999999999999999777777777 (32 digits)
66666666666666622222222222222333 (32 digits)
647777777777777777777777777 (27 digits)
44444441333333333333 (20 digits)
999996677777777777 (18 digits)
167777777777777 (15 digits)

I recommend this tool if you want to confirm primality: D. Alpern's ECM Applet

Also using a repdigit approach, which seems to be the approach most likely to find large values. The following script algorithmically skips over most numbers or truncations which will result in multiples of 2, 3, 5 and now 11 c/o PeterTaylor (his contribution increased the efficiency by approximately 50%).

from my_math import is_prime

sets = [
 (set('147'), set('0147369'), set('1379')),
 (set('369'), set('147'), set('1379')),
 (set('369'), set('0369'), set('17')),
 (set('258'), set('0258369'), set('39')),
 (set('369'), set('258'), set('39'))]

div2or5 = set('024568')

for n in range(3, 100):
 for sa, sb, sc in sets:
  for a in sa:
   for b in sb-set([a]):
    bm1 = int(b in div2or5)
    for c in sc-set([b]):
     if int(a+b+c)%11 == 0: continue
     for na in xrange(1, n-1, 1+(n&1)):
      eb = n - na
      for nb in xrange(1, eb-bm1, 1+(~eb&1)):
       nc = eb - nb
       if not is_prime(long(a*(na-1) + b*nb + c*nc)):
        continue
       if not is_prime(long(a*na + b*(nb-1) + c*nc)):
        continue
       if not is_prime(long(a*na + b*nb + c*(nc-1))):
        continue
       if not is_prime(long(a*na + b*nb + c*nc)):
        continue
       print a*na + b*nb + c*nc

my_math.py can be found here: http://codepad.org/KtXsydxK
Alternatively, you could also use the gmpy.is_prime function: GMPY Project

Some small speed improvements as a result of profiling. The primality check for the longest of the four candidates has been moved to the end, xrange replaces range, and long replaces int type casts. int seems to have unnecessary overhead if the evaluated expression results in a long.

Divisibility Rules

Let N be a postitive integer of the form a...ab...bc...c, where a, b and c are repeated digits.

By 2 and 5
- To avoid divisibility by 2 and 5, c may not be in the set [0, 2, 4, 5, 6, 8]. Additionally, if b is a member of this set, the length of c may be no less than 2.

By 3
- If N = 1 (mod 3), then N may not contain any of [1, 4, 7], as removing any of these would trivially result in a multiple of 3. Likewise for N = 2 (mod 3) and [2, 5, 8]. This implementation uses a slightly weakened form of this: if N contains one of [1, 4, 7], it may not contain any of [2, 5, 8], and vice versa. Additionally, N may not consist solely of [0, 3, 6, 9]. This is largely an equivalent statement, but it does allow for some trivial cases, for example a, b, and c each being repeated a multiple of 3 times.

By 11
- As PeterTaylor notes, if N is of the form aabbcc...xxyyzz, that is it consists only of digits repeated an even number of times, it is trivially divisible by 11: a0b0c...x0y0z. This observation eliminates half of the search space. If N is of odd length, then the length of a, b and c must all be odd as well (75% search space reduction), and if N is of even length, then only one of a, b or c may be even in length (25% search space reduction).
- Conjecture: if abc is a multiple of 11, for example 407, then all odd repetitions of a, b and c will also be multiples of 11. This falls out of the scope of the above divisibility by 11 rule; in fact, only odd repetitions are among those which are explicitly allowed. I don't have a proof for this, but systematic testing was unable to find a counter-example. Compare: 444077777, 44444000777, 44444440000077777777777, etc. ~~Anyone may feel free to prove or disprove this conjecture.~~ aditsu has since demonstrated this to be correct.

Other Forms

2 sets of repeated digits
Numbers of the form that randomra was pursuing, a...ab...b, seem to be much more rare. There are only 7 solutions less than 10¹⁷⁰⁰, the largest of which is 12 digits in length.

4 sets of repeated digits
Numbers of this form, a...ab...bc...cd...d, appear to be more densely distributed than those I was searching for. There are 69 solutions less than 10¹⁰⁰, compared to the 32 using 3 sets of repeated digits. Those between 10¹¹ and 10¹⁰⁰ are as follows:

190000007777
700000011119
955666663333
47444444441111
66666622222399
280000000033333
1111333333334999
1111333333377779
1199999999900111
3355555666999999
2222233333000099
55555922222222233333
444444440004449999999
3366666633333333377777
3333333333999888883333
4441111113333333333311111
2222222293333333333333999999
999999999339999999977777777777
22222226666666222222222299999999
333333333333333333339944444444444999999999
559999999999933333333333339999999999999999
3333333333333333333111111111111666666666611111
11111111333330000000000000111111111111111111111
777777777770000000000000000000033333339999999999999999999999999
3333333333333333333333333333333333333333333333336666666977777777777777
666666666666666666611111113333337777777777777777777777777777777777777777
3333333333333333333888889999999999999999999999999999999999999999999999999933333333

There's a simple heuristic argument as to why this should be the case. For each digital length, there is a number of repeated sets (i.e. 3 repeated sets, or 4 repeated sets, etc.) for which the expected number of solutions will be the highest. The transition occurs when the number of additional possible solutions, taken as a ratio, outweighs the probability that the additional number to be checked is prime. Given the exponential nature of the possibilities to check, and the logarithmic nature of prime number distribution, this happens relatively quickly.

If, for example, we wanted to find a 300 digit solution, checking 4 sets of repeated digits would be far more likely to produce a solution than 3 sets, and 5 sets would be more likely still. However, with the computing power that I have at my disposal, finding a solution much larger than 100 digits with 4 sets would be outside of my capacity, let alone 5 or 6.

primo

Posted 2013-02-21T14:27:06.763

Reputation: 30 891

3Any solution of the form d^x e^y f^z requires at least two of the sequence lengths to be odd to avoid divisibility by 11. I don't know whether is_prime will reject multiples of 11 quickly enough to make this not worth explicitly taking into account. – Peter Taylor – 2013-02-22T13:43:24.273

I don't have the gmp source in front of me, but it very likely begins with trial division over small primes. Still, (na&1)+(nb&1)+(nc&1) > 1 is simple enough that it should be faster. Wait a minute, this can short-curcuit full branches! If na is even, and nb + nc is odd, then one of [nb, nc] must necessarily be even, and you can just skip to the next na. – primo – 2013-02-22T14:55:35.660

Be careful if you are using gmpy.is_prime(). Beyond a certain point it is probabilistic, so you need to check it returns a 2. 1 means it's only probably a prime – gnibbler – 2013-02-22T22:50:46.470

Looks like my_math also becomes probablistic, but doesn't make a distinction between certain primes and probably primes. – gnibbler – 2013-02-22T22:54:55.407

+1 Very impressive. Would you explain how it works? Does set('369') mean the set of {3,6,9}? What are sa, sb, sc? (I assume they are the three subsets of sets but I'm puzzled how the compiler knows this. – DavidC – 2013-02-27T17:02:23.193

@DavidCarraher set('369') is equivalent to set(['3', '6', '9']), as strings are sequence types. sets is an array of tuples, with sa the first set of the tuple, sb the second set, and sc the third. The set distribution is based on the observation that if n = 1 (mod 3), then n may contain none of 1, 4, 7, and likewise for n = 2 (mod 3) and 2, 5, 8 (as removing one of these would trivially produce a multiple of 3). This is weakened to be: if n contains a 1, 4 or 7, it may not contain a 2, 5 or 8, and vice versa, which is largely an equivalent statement. – primo – 2013-02-27T17:31:08.160

I would +1 again if I could. – Shmiddty – 2013-02-27T21:11:08.937

I'm puzzled that the proof that 444444444444444444444444444444444444444444444444441111111113333333333 33333333333333 is a "robust prime" takes 2 minutes. It takes less than .002 s with Mathematica. Surely you realize that you only have to verify that 4 numbers are all primes? – DavidC – 2013-02-28T00:12:12.013

The 84 digit prime take a negligible amount of time to prove (using APRT-CLE). It's the 274 digit prime that takes about 2 minutes for each of the four. – primo – 2013-02-28T05:30:50.170

4A direct and exact test for divisibility by 11 is to add all the digits in even positions and subtract all the digits in odd positions (or vice versa) and check if the result is a multiple of 11. As a corollary (but can also be deduced directly), you can reduce all sequences of 2+ identical digits to 0 or 1 digits (taking the sequence length % 2). 44444440000077777777777 thus reduces to 407; 4+7-0=11. 444444444444444444444444444444444444444444444444441111111113333333333333333333333333 reduces to 13. – aditsu quit because SE is EVIL – 2013-02-28T11:40:34.640

Also, for divisibility by 7 you can similarly reduce sequences of digits by taking the sequence length % 6 (and that works for 11, 13 and 37 too at the same time). 444444444444444444444444444444444444444444444444441111111113333333333333333333333333 -> 441113 – aditsu quit because SE is EVIL – 2013-02-28T11:51:32.157

The 274 digit prime took .04 s to verify as robust; i.e. it verified that the original and the 3 clipped numbers are all primes. That's quite a difference from 8 min. Looks like the PrimeQ you are using is slow. (Which is no criticism of your brilliant work.) – DavidC – 2013-02-28T14:27:56.963

@DavidCarraher well, the App is written in Java ;) – primo – 2013-02-28T14:29:07.547

1"robust" != proven. The difference is unimportant to some, crucial to others. PrimeQ in Mathematica is a BPSW variant plus an extra M-R with base 3, so of course that will only take a couple milliseconds. Pari/GP proves the 274 digit number using APR-CL in about 3 seconds on a 5 year old computer, and single-core open source ECPP takes about 2 seconds. No surprise it takes longer for Java, but it isn't a big deal. I had my Perl translation of this do BPSW on all 4, then a proof on all 4 only if they all passed the cheap tests. – DanaJ – 2014-08-04T20:12:51.020

222223333333 (12 digits)

Here I only searched aa..aabb..bb format up to 100 digits. Only other hits are 23 37 53 73 113 311.

J code (cleaned up) (sorry, no explanation):

a=.>,{,~<>:i.100
b=.>,{,~<i.10
num=.".@(1&":)@#~
p=.(*/"1@:((1&p:)@num) (]-"1(0,=@i.@#)))"1 1
]res=./:~~.,b (p#num)"1 1/ a

randomra

Posted 2013-02-21T14:27:06.763

Reputation: 19 909

An exhaustive search of this form up to 1560 digits (and counting) reveals nothing larger than this 12 digit solution. – primo – 2013-03-01T07:28:42.710

Edit: Someone already did a waaay deeper analysis than I did here.

Not a solution but a rough estimation on the number of n-digit solutions.

Estimated number of solutions

Generating J code

   ops=: 'title ','Estimated number of solutions by digits',';xcaption ','digits',';ycaption ','log10 #'
   ops plot 10^.((%^.)%(2&(%~)@^.@(%&10))^(10&^.))(10&^(2+i.100))

randomra

Posted 2013-02-21T14:27:06.763

Reputation: 19 909

Thanks. The y axis is a little confusing. Do you really mean 10^-100 as the estimated number of solutions with roughly 86 digits? – motl7 – 2013-02-21T17:51:55.160

Yes. If there are finite number of solutions it is believeble. Although based on the existing data this estimation is a bit off because repeating digits create correlation between the numbers with one less digit.

– randomra – 2013-02-21T17:58:40.580

Someone already did a waaay deeper analysis than I.

– randomra – 2013-02-21T18:06:35.863

Is the y-axis the proportion of numbers with x digits that are solutions? That is the number of solutions divided by 10^(#digits)? It can't be the number as that looks like 4, 11 etc. and log of that is almost always above 1. – motl7 – 2013-02-21T18:10:55.817

Javascript (Brute Force)

Has not yet found a higher number

http://jsfiddle.net/79FDr/4/

Without a bigint library, javascript is limited to integers <= 2^53.

Since it's Javascript, the browser will complain if we don't release the execution thread for the UI to update, as a result, I decided to track where the algorithm is in its progression in the UI.

function isPrime(n){
    return n==2||(n>1&&n%2!=0&&(function(){
        for(var i=3,max=Math.sqrt(n);i<=max;i+=2)if(n%i==0)return false;
        return true;
    })());
};

var o=$("#o"), m=Math.pow(2,53),S=$("#s");

(function loop(n){
    var s = n.toString(),t,p=true,i=l=s.length,h={};
    if(isPrime(n)){
        while(--i){
            t=s.substring(0,i-1) + s.substring(i,l); // cut out a digit
            if(!h[t]){   // keep a hash of numbers tested so we don't end up testing 
                h[t]=1;  // the same number multiple times
                if(!isPrime(+t)){p=false;break;}
            }
        }
        if(p)
            o.append($("<span>"+n+"</span>"));
    }
    S.text(n);
    if(n+2 < m)setTimeout(function(){
        loop(n+2);
    },1);
})(99444901133);

Shmiddty

Posted 2013-02-21T14:27:06.763

Reputation: 1 209

@Schmiddty There are big int libraries for js but this brute force method seems doomed. – motl7 – 2013-02-22T09:16:02.953

1@motl7 Agreed, left it running all night, and no answers have been found. – Shmiddty – 2013-02-22T16:38:48.617

A link to an analysis of the problem was posted, but I thought it was missing a few things. Let's look at numbers of m digits, consisting of k sequences of 1 or more identical digits. It was shown that if we split digits into the groups { 0, 3, 6, 9 }, { 1, 4, 7 }, and { 2, 5, 8 }, a solution cannot contain digits from both the second and third group, and it must contain 3n + 2 digits from one of these groups. At least two of the k sequences must have an odd number of digits. Out of the digits { 1, 4, 7 } only 1 and 7 can be the lowest digit. None of { 2, 5, 8 } can be the lowest digit. So there are either four (1, 3, 7, 9) or two (3, 9) choices for the lowest digit, six choices of digits for the other sequences of equal digits (7 digits but not the same as the previous sequence) except on average about 5 1/7 choices for the highest digit (0 is excluded giving only 5 choices, except when the second largest sequence contained zeroes).

How many candidates are there? We have m digits split in k sequences of at least 1 digit. There are (m - k + 1) over (k - 1) ways to choose the lengths of these sequences, which is about (m - 1.5k + 2)^(k - 1) / (k - 1)!. There are either 2 or 4 choices for the lowest digit, six in total. There are six choices for the other digits, except 36/7 choices for the highest digit; the total is (6/7) * 6^k. There are 2^k ways to pick whether the length of a sequence is even or odd; k + 1 of these are excluded because none or only one are odd; we multiply the number of choices by (1 - (k + 1) / 2^k), which is 1/4 when k = 2, 1/2 when k = 3, 11/16 when k = 4 etc. The number of digits from the set { 1, 4, 7 } or { 2, 5, 8 } must be 3n + 2, so the number of choices are divided by 3.

Multiplying all these numbers, the number of candidates is

(m - 1.5k + 2)^(k - 1) / (k - 1)! * (6/7) * 6^k * (1 - (k + 1) / 2^k) / 3

(m - 1.5k + 2)^(k - 1) / (k - 1)! * (2/7) * 6^k * (1 - (k + 1) / 2^k)

The candidate itself and k numbers which are created by removing a digit must all be primes. The probability that a random integer around N is prime is about 1 / ln N. The probability for a random m digit number is about 1 / (m ln 10). However, the numbers here are not random. They have all been picked to be not divisible by 2, 3, or 5. 8 out of any 30 consecutive integers are not divisible by 2, 3, or 5. Therefore, the probability of being a prime is (30 / 8) / (m ln 10) or about 1.6286 / m.

The expected number of solutions is about

(m - 1.5k + 2)^(k - 1) / (k - 1)! * (2/7) * 6^k * (1 - (k + 1) / 2^k) * (1.6286 / m)^(k + 1)

or for large m about

(1 - (1.5k - 2) / m)^(k - 1) / (k - 1)! * 0.465 * 9.772^k * (1 - (k + 1) / 2^k) / m^2

For k = 2, 3, 4, ... we get the following:

k = 2: 11.1 * (1 - 1/m) / m^2
k = 3: 108 * (1 - 2.5/m)^2 / m^2 
k = 4: 486 * (1 - 4/m)^3 / m^2


k = 10: 10,065 * (1 - 13/m)^9 / m^2

From k = 10 onward, the number gets smaller again.

gnasher729

Posted 2013-02-21T14:27:06.763

Reputation: 11

5Welcome to PPCG! This is an excellent analysis; however, we look for answers to be legitimate responses to the question. In other words, code. Unfortunately, that leaves scant space in our structure for commentary-only posts, which are relegated to the post comments. However, I would hate to see such a thorough effort be relegated to our slush pile, so I'd like to hint that if you added a computer program designed to answer the challenge requirements to your post, it would be more likely to be kept around. – Jonathan Van Matre – 2014-03-13T23:32:57.323

Also, I strongly recommend you check out our sister sites: https://math.stackexchange.com and http://mathoverflow.net/

– Jonathan Van Matre – 2014-03-13T23:35:30.157