## False Positives on an Integer Lattice

12

 User            Language      Score
=========================================
Ell             C++11         293,619,555
feersum         C++11         100,993,667
Ell             C++11          78,824,732
Geobits         Java           27,817,255
Ell             Python         27,797,402
Peter Taylor    Java                2,468
<reference>     Julia                 530


## Background

When working on a 2-D grid of integer coordinates, you sometimes want to know whether two vectors (with integer components) have the same magnitude. Of course, in Euclidean geometry the magnitude of a vector (x,y) is given by

√(x² + y²)


So a naive implementation might compute this value for both vectors and compare the results. Not only does that incur an unnecessary square root computation, it also causes trouble with floating point inaccuracies, which might yield false positives: vectors whose magnitudes are different, but where the significant digits in the floating point representation are all identical.

For the purposes of this challenge, we define a false positive as a pair of coordinate pairs (a,b) and (c,d) for which:

• Their squared magnitude is different when represented as 64-bit unsigned integers.
• Their magnitude is identical when represented as 64-bit binary floating point number and computed via a 64-bit square root (as per IEEE 754).

As an example, using 16-bit representations (instead of 64), the smallest1 pair of vectors which yields a false positive is

(25,20) and (32,0)


Their squared squared magnitudes are 1025 and 1024. Taking the square root yields

32.01562118716424 and 32.0


But in 16-bit floats both of these get truncated to 32.0.

Likewise, the smallest2 pair yielding a false positive for 32-bit representations is

(1659,1220) and (1951,659)


1"Smallest" as measured by their 16-bit floating point magnitude.
2"Smallest" as measured by their 32-bit floating point magnitude.

Lastly, here are a handful of valid 64-bit cases:

 (51594363,51594339) and (54792160,48184783)
(54356775,54353746) and (54620742,54088476)
(54197313,46971217) and (51758889,49645356)
(67102042,  956863) and (67108864,       6) *


* The last case is one of several with the smallest possible magnitude for 64-bit false positives.

## The Challenge

In less than 10,000 bytes of code, using a single thread, you're to find as many false positives for 64-bit (binary) floating point numbers in the coordinate range 0 ≤ y ≤ x (that is, only within the first octant of the Euclidian plane) such that x² + y² ≤ 253 within 10 minutes. If two submissions tie for the same number of pairs, the tie breaker is the actual time taken to find the last of those pairs.

Your program must not use more than 4 GB of memory at any time (for practical reasons).

It must be possible to run your program in two modes: one which outputs every pair as it finds it, and one which only outputs the number of found pairs at the end. The first will be used to verify the validity of your pairs (by looking at some sample of outputs) and the second will be used for actually timing your submission. Note that the printing must be the only difference. In particular, the counting program may not hardcode the number of pairs it could find. It must still perform the same loop that would be used to print all the numbers, and only omit the printing itself!

I will test all submissions on my Windows 8 laptop, so please ask in the comments if you want to use some not-too-common language.

Note that pairs must not be counted twice under switching of the first and second coordinate pair.

Note also that I will run your process via a Ruby controller, which will kill your process if it hasn't finished after 10 minutes. Make sure to output the number of pairs found by then. You can either keep track of time yourself and print the result just before the 10 minutes elapse, or you just output the number of pairs found sporadically, and I will take the last such number as your score.

As a side comment, it's possible to simultaneously determine whether or not an integer is a perfect square and also compute its precise square root efficiently. The following algorithm is 5x faster than hardware square root on my system (comparing 64-bit unsigned integers to 80-bit long double): http://math.stackexchange.com/questions/41337/efficient-way-to-determine-if-a-number-is-perfect-square/878338#878338

– Todd Lehman – 2014-09-12T19:36:55.063

5

## C++, 275,000,000+

We'll refer to pairs whose magnitude is accurately representable, such as (x, 0), as honest pairs and to all other pairs as dishonest pairs of magnitude m, where m is the pair's wrongly-reported magnitude. The first program in the previous post used a set of tightly related couples of honest- and dishonest-pairs:
(x, 0) and (x, 1), respectively, for large enough x. The second program used the same set of dishonest pairs but extended the set of honest pairs by looking for all honest pairs of integral magnitude. The program doesn't terminate within ten minutes, but it finds the vast majority of its results very early on, which means that most of the runtime goes to waste. Instead of keep looking for ever-less-frequent honest pairs, this program uses the spare time to do the next logical thing: extending the set of dishonest pairs.

From the previous post we know that for all large-enough integers r, sqrt(r2 + 1) = r, where sqrt is the floating-point square root function. Our plan of attack is to find pairs P = (x, y) such that x2 + y2 = r2 + 1 for some large-enough integer r. That's simple enough to do, but naively looking for individual such pairs is too slow to be interesting. We want to find these pairs in bulk, just like we did for honest pairs in the previous program.

Let {v, w} be an orthonormal pair of vectors. For all real scalars r, ||r v + w||2 = r2 + 1. In 2, this is a direct result of the Pythagorean theorem:

We're looking for vectors v and w such that there exists an integer r for which x and y are also integers. As a side note, note that the set of dishonest pairs we used in the previous two programs was simply a special case of this, where {v, w} was the standard basis of 2; this time we wish to find a more general solution. This is where Pythagorean triplets (integers triplets (a, b, c) satisfying a2 + b2 = c2, which we used in the previous program) make their comeback.

Let (a, b, c) be a Pythagorean triplet. The vectors v = (b/c, a/c) and w = (-a/c, b/c) (and also
w = (a/c, -b/c)) are orthonormal, as is easy to verify. As it turns out, for any choice of Pythagorean triplet, there exists an integer r such that x and y are integers. To prove this, and to effectively find r and P, we need a little number/group theory; I'm going to spare the details. Either way, suppose we have our integral r, x and y. We're still short of a few things: we need r to be large enough and we want a fast method to derive many more similar pairs from this one. Fortunately, there's a simple way to accomplish this.

Note that the projection of P onto v is r v, hence r = P · v = (x, y) · (b/c, a/c) = xb/c + ya/c, all this to say that xb + ya = rc. As a result, for all integers n, (x + bn)2 + (y + an)2 = (x2 + y2) + 2(xb + ya)n + (a2 + b2)n2 = (r2 + 1) + 2(rc)n + (c2)n2 = (r + cn)2 + 1. In other words, the squared magnitude of pairs of the form
(x + bn, y + an) is (r + cn)2 + 1, which is exactly the kind of pairs we're looking for! For large enough n, these are dishonest pairs of magnitude r + cn.

It's always nice to look at a concrete example. If we take the Pythagorean triplet (3, 4, 5), then at r = 2 we have P = (1, 2) (you can check that (1, 2) · (4/5, 3/5) = 2 and, clearly, 12 + 22 = 22 + 1.) Adding 5 to r and (4, 3) to P takes us to r' = 2 + 5 = 7 and P' = (1 + 4, 2 + 3) = (5, 5). Lo and behold, 52 + 52 = 72 + 1. The next coordinates are r'' = 12 and P'' = (9, 8), and again, 92 + 82 = 122 + 1, and so on, and so on ...

Once r is large enough, we start getting dishonest pairs with magnitude increments of 5. That's roughly 27,797,402 / 5 dishonest pairs.

So now we have plenty of integral-magnitude dishonest pairs. We can easily couple them with the honest pairs of the first program to form false-positives, and with due care we can also use the honest pairs of the second program. This is basically what this program does. Like the previous program, it too finds most of its results very early on---it gets to 200,000,000 false positives within a few seconds---and then slows down considerably.

Compile with g++ flspos.cpp -oflspos -std=c++11 -msse2 -mfpmath=sse -O3. To verify the results, add -DVERIFY (this will be notably slower.)

Run with flspos. Any command-line argument for verbose mode.

#include <cstdio>
#define _USE_MATH_DEFINES
#undef __STRICT_ANSI__
#include <cmath>
#include <cfloat>
#include <vector>
#include <iterator>
#include <algorithm>

using namespace std;

/* Make sure we actually work with 64-bit precision */
#if defined(VERIFY) && FLT_EVAL_METHOD != 0 && FLT_EVAL_METHOD != 1
#   error "invalid FLT_EVAL_METHOD (did you forget -msse2 -mfpmath=sse'?)"
#endif

template <typename T> struct widen;
template <> struct widen<int> { typedef long long type; };

template <typename T>
inline typename widen<T>::type mul(T x, T y) {
return typename widen<T>::type(x) * typename widen<T>::type(y);
}
template <typename T> inline T div_ceil(T a, T b) { return (a + b - 1) / b; }
template <typename T> inline typename widen<T>::type sq(T x) { return mul(x, x); }
template <typename T>
T gcd(T a, T b) { while (b) { T t = a; a = b; b = t % b; } return a; }
template <typename T>
inline typename widen<T>::type lcm(T a, T b) { return mul(a, b) / gcd(a, b); }
template <typename T>
T div_mod_n(T a, T b, T n) {
if (b == 0) return a == 0 ? 0 : -1;
const T n_over_b = n / b, n_mod_b = n % b;
for (T m = 0; m < n; m += n_over_b + 1) {
if (a % b == 0) return m + a / b;
a -= b - n_mod_b;
if (a < 0) a += n;
}
return -1;
}

template <typename T> struct pythagorean_triplet { T a, b, c; };
template <typename T>
struct pythagorean_triplet_generator {
typedef pythagorean_triplet<T> result_type;
private:
typedef typename widen<T>::type WT;
result_type p_triplet;
WT p_c2b2;
public:
pythagorean_triplet_generator(const result_type& triplet = {3, 4, 5}) :
p_triplet(triplet), p_c2b2(sq(triplet.c) - sq(triplet.b))
{}
const result_type& operator*() const { return p_triplet; }
const result_type* operator->() const { return &p_triplet; }
pythagorean_triplet_generator& operator++() {
do {
if (++p_triplet.b == p_triplet.c) {
++p_triplet.c;
p_triplet.b = ceil(p_triplet.c * M_SQRT1_2);
p_c2b2 = sq(p_triplet.c) - sq(p_triplet.b);
} else
p_c2b2 -= 2 * p_triplet.b - 1;
p_triplet.a = sqrt(p_c2b2);
} while (sq(p_triplet.a) != p_c2b2 || gcd(p_triplet.b, p_triplet.a) != 1);
return *this;
}
result_type operator()() { result_type t = **this; ++*this; return t; }
};

int main(int argc, const char* argv[]) {
const bool verbose = argc > 1;

const int min = 1 << 26;
const int max = sqrt(1ll << 53);

const size_t small_triplet_count = 1000;
vector<pythagorean_triplet<int>> small_triplets;
small_triplets.reserve(small_triplet_count);
generate_n(
back_inserter(small_triplets),
small_triplet_count,
pythagorean_triplet_generator<int>()
);

int found = 0;
auto add = [&] (int x1, int y1, int x2, int y2) {
#ifdef VERIFY
auto n1 = sq(x1) + sq(y1), n2 = sq(x2) + sq(y2);
if (x1 < y1 || x2 < y2 || x1 > max || x2 > max ||
n1 == n2 || sqrt(n1) != sqrt(n2)
) {
fprintf(stderr, "Wrong false-positive: (%d, %d) (%d, %d)\n",
x1, y1, x2, y2);
return;
}
#endif
if (verbose) printf("(%d, %d) (%d, %d)\n", x1, y1, x2, y2);
++found;
};

int output_counter = 0;
for (int x = min; x <= max; ++x) add(x, 0,    x, 1);
for (pythagorean_triplet_generator<int> i; i->c <= max; ++i) {
const auto& t1 = *i;

for (int n = div_ceil(min, t1.c); n <= max / t1.c; ++n)
add(n * t1.b, n * t1.a,    n * t1.c, 1);

auto find_false_positives = [&] (int r, int x, int y) {
{
int n = div_ceil(min - r, t1.c);
int min_r = r + n * t1.c;
int max_n = n + (max - min_r) / t1.c;
for (; n <= max_n; ++n)
add(r + n * t1.c, 0,    x + n * t1.b, y + n * t1.a);
}
for (const auto t2 : small_triplets) {
int m = div_mod_n((t2.c - r % t2.c) % t2.c, t1.c % t2.c, t2.c);
if (m < 0) continue;
int sr = r + m * t1.c;
int c = lcm(t1.c, t2.c);
int min_n = div_ceil(min - sr, c);
int min_r = sr + min_n * c;
if (min_r > max) continue;
int x1 = x + m * t1.b, y1 = y + m * t1.a;
int x2 = t2.b * (sr / t2.c), y2 = t2.a * (sr / t2.c);
int a1 = t1.a * (c / t1.c), b1 = t1.b * (c / t1.c);
int a2 = t2.a * (c / t2.c), b2 = t2.b * (c / t2.c);
int max_n = min_n + (max - min_r) / c;
int max_r = sr + max_n * c;
for (int n = min_n; n <= max_n; ++n) {
x2 + n * b2, y2 + n * a2,
x1 + n * b1, y1 + n * a1
);
}
}
};
{
int m = div_mod_n((t1.a - t1.c % t1.a) % t1.a, t1.b % t1.a, t1.a);
find_false_positives(
/* r = */ (mul(m, t1.c) + t1.b) / t1.a,
/* x = */ (mul(m, t1.b) + t1.c) / t1.a,
/* y = */ m
);
} {
int m = div_mod_n((t1.b - t1.c % t1.b) % t1.b, t1.a, t1.b);
find_false_positives(
/* r = */ (mul(m, t1.c) + t1.a) / t1.b,
/* x = */ m,
/* y = */ (mul(m, t1.a) + t1.c) / t1.b
);
}

if (output_counter++ % 50 == 0)
printf("%d\n", found), fflush(stdout);
}
printf("%d\n", found);
}


Nice! :) I got 293,619,555 on my machine and updated the leaderboard. – Martin Ender – 2014-09-23T12:02:37.450

8

## Python, 27,797,402

Just to set the bar a little higher...

from sys import argv
verbose = len(argv) > 1
found = 0
for x in xrange(67108864, 94906266):
found += 1
if verbose:
print "(%d, 0) (%d, 1)" % (x, x)
print found


It's easy to verify that for all 67,108,864 <= x <= 94,906,265 = floor(sqrt(253)) the pairs (x, 0) and (x, 1) are false positives.

Why it works: 67,108,864 = 226. Therfore, all numbers x in the above range are of the form 226 + x' for some 0 <= x' < 226. For all positive e, (x + e)2 = x2 + 2xe + e2 = x2 + 227e + 2x'e + e2. If we want to have
(x + e)2 = x2 + 1 we need at least 227e <= 1, that is, e <= 2-27 However, since the mantissa of double-precision floating point numbers is 52-bit wide, the smallest e such that x + e > x is e = 226 - 52 = 2-26. In other words, the smallest representable number greater than x is x + 2-26 while the result of sqrt(x2 + 1) is at most x + 2-27. Since the default IEEE-754 rounding mode is round-to-nearest; ties-to-even, it will always round to x and never to x + 2-26 (where the tie-break is really only relevant for x = 67,108,864, if at all. Any larger number will round to x regardless).

## C++, 75,000,000+

Recall that 32 + 42 = 52. What this means is that the point (4, 3) lies on the circle of radius 5 centered around the origin. Actually, for all integers n, (4n, 3n) lies on such circle of radius 5n. For large enough n (namely, such that 5n >= 226), we already know a false-positive for all points on this circle: (5n, 1). Great! That's another 27,797,402 / 5 free false-positive pairs right there! But why stop here? (3, 4, 5) is not the only such triplet.

This program looks for all positive integer triplets (a, b, c) such that a2 + b2 = c2, and counts false-positives in this fashion. It gets to 70,000,000 false-positives pretty fast but then slows down considerably as the numbers grow.

Compile with g++ flspos.cpp -oflspos -std=c++11 -msse2 -mfpmath=sse -O3. To verify the results, add -DVERIFY (this will be notably slower.)

Run with flspos. Any command-line argument for verbose mode.

#include <cstdio>
#include <cmath>
#include <cfloat>

using namespace std;

/* Make sure we actually work with 64-bit precision */
#if defined(VERIFY) && FLT_EVAL_METHOD != 0 && FLT_EVAL_METHOD != 1
#   error "invalid FLT_EVAL_METHOD (did you forget -msse2 -mfpmath=sse'?)"
#endif

template <typename T> inline long long sqr(T x) { return 1ll * x * x; }
template <typename T>
T gcd(T a, T b) { while (b) { T t = a; a = b; b = t % b; } return a; }

int main(int argc, const char* argv[]) {
const bool verbose = argc > 1;

const int min = 1 << 26;
const int max = sqrt(1ll << 53);

int found = 0;
auto add = [=, &found] (int x1, int y1, int x2, int y2) {
#ifdef VERIFY
auto n1 = sqr(x1) + sqr(y1), n2 = sqr(x2) + sqr(y2);
if (n1 == n2 || sqrt(n1) != sqrt(n2)) {
fprintf(stderr, "Wrong false-positive: (%d, %d) (%d, %d)\n",
x1, y1, x2, y2);
return;
}
#endif
if (verbose) printf("(%d, %d) (%d, %d)\n", x1, x2, y1, y2);
++found;
};

for (int x = min; x <= max; ++x) add(x, 0,    x, 1);

for (int a = 1; a < max; ++a) {
auto a2b2 = sqr(a) + 1;
for (int b = 1; b <= a; a2b2 += 2 * b + 1, ++b) {
int c = sqrt(a2b2);
if (a2b2 == sqr(c) && gcd(a, b) == 1) {
int max_c = max / c;
for (int n = (min + c - 1) / c; n <= max_c; ++n)
add(n * a, n * b,    n * c, 1);
}
}

if (a % 512 == 0) printf("%d\n", found), fflush(stdout);
}

printf("%d\n", found);
}


Yeesh, that's an effective strategy. I had thought that the 2**53 boundary was chosen to rule this out, but I guess not. – xnor – 2014-09-11T23:51:15.317

Funny how every number in this range works without a single instance of the square roots of x^2 and x^2 + 1 falling on different sides of an integer + 1/2. – feersum – 2014-09-12T02:00:28.533

@xnor The boundary was chosen in order for the squared magnitude to be exactly representable in 64-bit floats. – Martin Ender – 2014-09-12T07:02:05.807

Hey, it works, who cares how? ;) Do you mean that the program should count in a dummy loop, or actually verify the results? – Ell – 2014-09-12T11:03:14.293

@MartinButtner Oh, I see. It looks like the lower bound is that amount divided by the square root of 2. I understand heuristically why such numbers should work, but I'm also curious why every single one works. – xnor – 2014-09-12T18:18:32.043

I don't think anyone would object if you posted that new submission as an individual answer. I think it would deserve independent upvotes. (Also, I only noticed it because your edit bumped the question, which gave it another upvote. If you posted as an answer, I would have got a notification.) – Martin Ender – 2014-09-13T16:20:25.123

Also, are you sure the new isn't generating any duplicates? – Martin Ender – 2014-09-13T17:24:48.690

@MartinBüttner: Yes. Note the test that gcd(a, b) = 1. All the points are on lines of different slopes that intersect at the origin. – Ell – 2014-09-13T18:42:48.543

@Ell Ah, nifty. Thanks for the clarification! – Martin Ender – 2014-09-13T18:43:54.613

@MartinBüttner: I have an interesting extension to this that gets to a quarter billion. Is it still worth posting? – Ell – 2014-09-21T12:05:46.887

@Ell Sure, I'll keep testing submissions and updating the leaderboard (and accepted answer) when they come in. – Martin Ender – 2014-09-21T12:07:56.343

4

# C++11 - 100,993,667

### EDIT: New program.

The old one used too much memory. This one halves the memory usage by using a giant vector array instead of a hash table. Also it removes the random thread cruft.

   /* by feersum  2014/9
http://codegolf.stackexchange.com/questions/37627/false-positives-on-an-integer-lattice */
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <vector>
using namespace std;
#define ul unsigned long long

#define K const

#define INS(A)   { bool already = false; \
for(auto e = res[A.p[0][0]].end(), it = res[A.p[0][0]].begin(); it != e; ++it) \
if(A.p[0][1] == it->y1 && A.p[1][0] == it->x2 && A.p[1][1] == it->y2) { \
break; } \
if(!already) res[A.p[0][0]].push_back( {A.p[0][1], A.p[1][0], A.p[1][1]} ), ++n; }

#define XMAXMIN (1<<26)

struct ints3 {
int y1, x2, y2;
};

struct pparm {
int a,b,c,d;
int E[4];
pparm(int A,int B,int C, int D):
E{B*B+D*D,A*B+C*D,A*A+C*C+2*(B+D),A+C}
{
a=A;b=B;c=C;d=D;
}

};

struct ans {
int p[2][2];

};
ostream&operator<<(ostream&o, ans&a)
{
o<<'('<<a.p[0][0]<<','<<a.p[0][1]<<"),("<<a.p[1][0]<<','<<a.p[1][1]<<')'<<endl;
return o;
}

vector<ints3> res[XMAXMIN];

bool print;
int n;

void gen(K pparm&p1, K pparm&p2)
{
#ifdef DBUG
for(int i=0;i<2;i++){
K pparm&p=i?p2:p1;
cout<<' '<<p.a<<' '<<p.b<<' '<<p.c<<' '<<p.d<<' ';}
cout<<endl;
#endif

for(ul x = 0; ; ++x) {
ans a;
ul s[2];
for(int i = 0; i < 2; i++) {
K pparm &p = i?p2:p1;
int *pt = a.p[i];
pt[0] = p.b+x*(p.a+x);
pt[1] = p.d+x*(p.c+x);
s[i] = (ul)pt[0]*pt[0] + (ul)pt[1]*pt[1];
}
if(*s >> 53)
break;
if(s[1] - s[0] != 1)
exit(4);

if(sqrt(s[0]) == sqrt(s[1])) {
for(int i = 0; i < 2; i++)
if(a.p[i][0] > a.p[i][1])
swap(a.p[i][0], a.p[i][1]);
if(a.p[0][0] > a.p[0][1])
for(int i = 0; i < 2; i++)
swap(a.p[0][i], a.p[1][i]);
INS(a)
}
}
}

int main(int ac, char**av)
{
for(int i = 1; i < ac; i++) {
print |= !strcmp(av[1], "-P");
}

#define JMAX 43000000
for(ul j = 0; j < JMAX; j++) {
pparm p1(-~j,j,~-j,0),p2(j,1,j,j);
gen(p1,p2);
if(!print && !(j%1024))
#ifdef DBUG
cout<<j<<' ',
#endif
cout<<n<<endl;

}
if(print)
for(vector<ints3>& v: res)
for(ints3& i: v)
printf("(%d,%d),(%d,%d)\n", &v - res, i.y1, i.x2, i.y2);

return 0;
}


Run with a -P argument to print out the points instead of the number thereof.

For me it takes under 2 minutes in the counting mode and about 5 minutes with printing directed to a file (~4 GB), so it did not quite become I/O limited.

My original program was neat, but I dropped most of it since it only could produce on the order of 10^5 results. What it did is to look for parameterizations of the form (x^2+Ax+B, x^2+Cx+D), (x^2+ax+b, x^2+cx+d) such that for any x, (x^2+Ax+B)^2 + (x^2+Cx+D)^2 = (x^2+ax+b)^2 + (x^2+cx+d)^2 + 1. When it found such a set of parameters {a,b,c,d,A,B,C,D} it proceeded to check all of the x values under the maximum. While looking at my debug output from this program, I noticed a certain parameterization of the parameterization of the parameterization that allowed me to produce a lot of numbers easily. I elected not to print out Ell's numbers since I had plenty of my own. Hopefully now someone won't print out both of our sets of numbers and claim to be the winner :)

 /* by feersum  2014/9
http://codegolf.stackexchange.com/questions/37627/false-positives-on-an-integer-lattice */
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <unordered_set>
using namespace std;
#define ul unsigned long long

#define h(S) unordered_##S##set
#define P 2977953206964783763LL
#define K const

#define EQ(T, F)bool operator==(K T&o)K{return!memcmp(F,o.F,sizeof(F));}

struct pparm {
int a,b,c,d;
int E[4];
pparm(int A,int B,int C, int D):
E{B*B+D*D,A*B+C*D,A*A+C*C+2*(B+D),A+C}
{
a=A;b=B;c=C;d=D;
}
EQ(pparm,E)
};

struct ans {
int p[2][2];
EQ(ans,p)
};
ostream&operator<<(ostream&o, ans&a)
{
o<<'('<<a.p[0][0]<<','<<a.p[0][1]<<"),("<<a.p[1][0]<<','<<a.p[1][1]<<')'<<endl;
return o;
}

#define HASH(N,T,F) \
struct N { \
size_t operator() (K T&p) K { \
size_t h = 0; \
for(int i = 4; i--; ) \
h=h*P+((int*)p.F)[i]; \
return h; \
}};

#define INS(r, a) { \
bool new1 = r.insert(a).second; \
n += new1; \
if(print && new1) \
cout<<a; }

HASH(HA,ans,p)

bool print;
int n;

void gen(h()<ans,HA>&r, K pparm&p1, K pparm&p2)
{
#ifdef DBUG
for(int i=0;i<2;i++){
K pparm&p=i?p2:p1;
cout<<' '<<p.a<<' '<<p.b<<' '<<p.c<<' '<<p.d<<' ';}
cout<<endl;
#endif

for(ul x = 0; ; ++x) {
ans a;
ul s[2];
for(int i = 0; i < 2; i++) {
K pparm &p = i?p2:p1;
int *pt = a.p[i];
pt[0] = p.b+x*(p.a+x);
pt[1] = p.d+x*(p.c+x);
s[i] = (ul)pt[0]*pt[0] + (ul)pt[1]*pt[1];
}
if(*s >> 53)
break;
if(s[1] - s[0] != 1)
exit(4);

if(sqrt(s[0]) == sqrt(s[1])) {
for(int i = 0; i < 2; i++)
if(a.p[i][0] > a.p[i][1])
swap(a.p[i][0], a.p[i][1]);
INS(r,a)
}
}
//if(!print) cout<<n<<endl;
}

void endit()
{
exit(0);
}

int main(int ac, char**av)
{
bool kill = false;
for(int i = 1; i < ac; i++) {
print |= ac>1 && !stricmp(av[1], "-P");
kill |= !stricmp(av[i], "-K");
}

if(kill)

h()<ans, HA> res;
res.reserve(1<<27);

#define JMAX 43000000
for(ul j = 0; j < JMAX; j++) {
pparm p1(-~j,j,~-j,0),p2(j,1,j,j);
gen(res,p1,p2);
if(!print && !(j%1024))
#ifdef DBUG
cout<<j<<' ',
#endif
cout<<n<<endl;

}
exit(0);
}


I'm getting a bunch of compiler errors: http://pastebin.com/enNcY9fx Any clue what's going on?

– Martin Ender – 2014-09-12T18:30:28.513

@Martin No idea... I copied my post into a file, compiled on a Windows 8 Laptop with identical switches. Works fine for me. What version of gcc do you have? – feersum – 2014-09-13T02:41:33.933

Btw if they cause errors, you can simply delete all the thread-related bits which are completely unnecessary. They only do something if you use a "-K" option which is not needed. – feersum – 2014-09-13T03:06:20.877

g++ (GCC) 4.8.1. Okay, I removed the thread bits, but it's still not recognising stricmp for some reason. – Martin Ender – 2014-09-13T08:40:03.627

Funny, I have the same version. I found someone else reporting the absence of stricmp. They found that instead strcmp worked, which is fine here (case-sensitive rather than insensitive comparison).

– feersum – 2014-09-13T08:55:46.943

Okay, that compiles. But it seems to print only 0s without command-line argument and nothing at all with -P. – Martin Ender – 2014-09-13T09:00:06.203

Okay, it's the O3 switch that breaks the output. – Martin Ender – 2014-09-13T09:07:00.637

Your compiler is obviously inhabited by evil spirits, as I tested multiple runs with -O3...:P – feersum – 2014-09-13T09:13:20.817

– Martin Ender – 2014-09-13T09:20:48.523

@MartinBüttner how does this work? When I click on your link I see the preceding discussion but it tells me I am not logged in – feersum – 2014-09-13T09:36:23.780

I downloaded MinGW-W64 now, and finally got it to run! – Martin Ender – 2014-09-17T19:35:24.120

Did you remove the printing of the pairs though? – Martin Ender – 2014-09-17T19:44:32.440

@MartinBüttner whoops, looks like I must have gotten overzealous while deleting stuff. I made an edit restoring the -P functionality. – feersum – 2014-09-17T21:03:49.293

1I've got too many other things going on at the moment, so I'll tell you my idea to improve your approach. With radius-squared near the top end of the range, you can also get collisions between radius-squared which differ by 2. – Peter Taylor – 2014-09-17T21:07:24.627

1

## Java, Bresenham-esque circle scan

Heuristically I expect to get more collisions by starting at the wider end of the annulus. I expected to get some improvement by doing one scan for each collision, recording values for which surplus is between 0 and r2max - r2 inclusive, but in my testing that proved slower than this version. Similarly attempts to use a single int[] buffer rather than lots of creation of two-element arrays and lists. Performance optimisation is a strange beast indeed.

Run with a command-line argument for output of the pairs, and without for simple counts.

import java.util.*;

public class CodeGolf37627 {
public static void main(String[] args) {
final int M = 144;
boolean[] possible = new boolean[M];
for (int i = 0; i <= M/2; i++) {
for (int j = 0; j <= M/2; j++) {
possible[(i*i+j*j)%M] = true;
}
}

long count = 0;
double sqrt = 0;
long r2max = 0;
List<int[]> previousPoints = null;
for (long r2 = 1L << 53; ; r2--) {
if (!possible[(int)(r2 % M)]) continue;

double r = Math.sqrt(r2);
if (r != sqrt) {
sqrt = r;
r2max = r2;
previousPoints = null;
}
else {
if (previousPoints == null) previousPoints = findLatticePointsBresenham(r2max, (int)r);

if (previousPoints.size() == 0) {
r2max = r2;
previousPoints = null;
}
else {
List<int[]> points = findLatticePointsBresenham(r2, (int)r);
for (int[] p1 : points) {
for (int[] p2 : previousPoints) {
if (args.length > 0) System.out.format("(%d, %d) (%d, %d)\n", p1[0], p1[1], p2[0], p2[1]);
count++;
}
}
System.out.println(count);
}
}
}
}

// Surprisingly, this seems to be faster than doing one scan for all two or three r2s.
private static List<int[]> findLatticePointsBresenham(long r2, long r) {
List<int[]> rv = new ArrayList<int[]>();
// Require 0 = y = x
long x = r, y = 0, surplus = r2 - r * r;
while (y <= x) {
if (surplus == 0) rv.add(new int[]{(int)x, (int)y});

// Invariant: surplus = r2 - x*x - y*y >= 0
y++;
surplus -= 2*y - 1;
if (surplus < 0) {
x--;
surplus += 2*x + 1;
}
}

return rv;
}
}


1

## Java - 27,817,255

Most of these are the same as what Ell shows, and the rest are based on (j,0) (k,l). For each j, I walk back some squares and check if the remainder gives a false positive. This takes up basically the entire time with only a 25k (about 0.1%) gain over just (j,0) (j,1), but a gain is a gain.

This will finish in under ten minutes on my machine, but I don't know what you have. Because reasons, if it doesn't finish before the time runs out, it will have a drastically worse score. In that case, you can tweak the divisor on line 8 so that it finishes in time (this simply determines how far back it walks for each j). For some various divisors, scores are:

11    27817255 (best on OPs machine)
10    27818200
8     27820719
7     27822419 (best on my machine)


To turn on output for each match (and, god, it is slow if you do), just uncomment lines 10 and 19.

public class FalsePositive {
public static void main(String[] args){
long j = 67108864;
long start = System.currentTimeMillis();
long matches=0;
while(j < 94906265 && System.currentTimeMillis()-start < 599900){
long jSq = j*j;
long limit = (long)Math.sqrt(j)/11; // <- tweak to fit inside 10 minutes for best results
matches++; // count an automatic one for (j,0)(j,1)
//System.out.println("("+j+",0) ("+j+",1)");
for(int i=1;i<limit;i++){
long k = j-i;
long kSq = k*k;
long l = (long)Math.sqrt(jSq-kSq);
long lSq = l*l;
if(kSq+lSq != jSq){
if(Math.sqrt(kSq+lSq)==Math.sqrt(jSq)){
matches++;
//System.out.println("("+j+",0) ("+k+","+l+")");
}
}
}
j++;
}
System.out.println("\n"+matches+" Total matches, got to j="+j);
}
}


For reference, the first 20 outputs it gives (for divisor=7, excluding (j,0)(j,1) types) are:

(67110083,0) (67109538,270462)
(67110675,0) (67109990,303218)
(67111251,0) (67110710,269470)
(67111569,0) (67110668,347756)
(67112019,0) (67111274,316222)
(67112787,0) (67111762,370918)
(67115571,0) (67115518,84346)
(67117699,0) (67117698,11586)
(67117971,0) (67117958,41774)
(67120545,0) (67120040,260368)
(67121043,0) (67120118,352382)
(67122345,0) (67122320,57932)
(67122449,0) (67122444,25908)
(67122633,0) (67122328,202348)
(67122729,0) (67121972,318784)
(67122849,0) (67122568,194224)
(67124195,0) (67123818,224970)
(67125201,0) (67125172,62396)
(67125705,0) (67124632,379540)
(67126195,0) (67125882,204990)


0

## Julia, 530 false positives

Here is a very naive brute force search, which you can view as a reference implementation.

num = 0
for i = 60000000:-1:0
for j = i:-1:ifloor(0.99*i)
s = i*i + j*j
for x = ifloor(sqrt(s/2)):ifloor(sqrt(s))
min_y = ifloor(sqrt(s - x*x))
max_y = min_y+1
for y = min_y:max_y
r = x*x + y*y
if r != s && sqrt(r) == sqrt(s)
num += 1
if num % 10 == 0
println("Found \$num pairs")
end
#@printf("(i,j) = (%d,%d); (x,y) = (%d,%d); s = %d, r = %d\n", i,j,x,y,s,r)
end
end
end
end
end


You can print out the pairs (and their exact squared magnitudes) by uncommenting the @printf line.

Basically this starts the search at x = y = 6e7 for the first coordinate pair and scans down about 1% of the way to the x-axis before decrementing x. Then for each such coordinate pair, it checks the entire arc of the same magnitude (rounding up and down) for a collision.

The code assumes that it's run on a 64-bit system, so that the default integer and floating-point types are 64-bit ones (if not, you can create them with int64() and float64() constructors).

That yields a meagre 530 results.