Four Squares Together

19

5

Lagrange's four square theorem tells us any natural number can be represented as the sum of four square numbers. Your task is to write a program that does this.

Input: A natural number (below 1 billion)

Output: Four numbers whose squares sum to that number (order doesn't matter)

Note: You don't have to do a brute force search! Details here and here. If there is a function that trivializes this problem (I will determine), it's not allowed. Automated prime functions and square root are allowed. If there is more than one representation any one is fine. If you chose to do brute force, it must run within reasonable time (3 minutes)

sample input

123456789

sample output (either is fine)

10601 3328 2 0
10601 3328 2

qwr

Posted 2014-05-17T23:58:15.577

Reputation: 8 929

If my language of choice only supports numbers up to 9999 but can theoretically work for numbers larger, does it count? – MilkyWay90 – 2019-03-16T17:34:54.333

I may do brute force though if it makes my code shorter? – Martin Ender – 2014-05-17T23:59:38.110

@m.buettner Yes, but it should handle large numbers – qwr – 2014-05-18T00:01:54.570

@m.buettner Read the post, any natural number below 1 billion – qwr – 2014-05-18T00:09:19.220

Ah sorry overlooked that. – Martin Ender – 2014-05-18T00:12:28.220

Does "natural numbers" include 0? – Dennis – 2014-05-21T05:00:04.333

@Dennis it certainly looks like there's no agreement on this in the maths world, so I gave myself the benefit of the doubt. http://en.wikipedia.org/wiki/Natural_number Many programs including mine will need a patch to work with n=0. The first issue is division by 4 (you can get into an infinite loop trying to divide zero by 4 until you reach a number not divisible by 4) and more problems occur later.

– Level River St – 2014-05-22T19:35:16.360

2@Dennis Natural numbers in this case do not include 0. – qwr – 2014-05-22T19:56:22.177

This theorem is impressive. – Nicolas Barbulesco – 2014-05-30T12:41:34.580

The natural numbers include 0. In maths. – Nicolas Barbulesco – 2014-05-30T12:42:31.420

@NicolasBarbulesco There is no agreement on whether 0 should be included, but in this case I have excluded it to make golfing easier – qwr – 2014-05-30T20:49:44.663

Answers

1

CJam, 50 bytes

li:NmF0=~2/#:J(_{;)__*N\-[{_mqJ/iJ*__*@\-}3*])}g+p

My third (and final, I promise) answer. This approach is based heavily on primo's answer.

Try it online in the CJam interpreter.

Usage

$ cjam 4squares.cjam <<< 999999999
[189 31617 567 90]

Background

  1. After seeing primo's updated algorithm, I had to see how a CJam implementation would score:

    li{W):W;:N4md!}g;Nmqi)_{;(__*N\-[{_mqi__*@\-}3*])}g+2W#f*p
    

    Only 58 bytes! This algorithm is performs in nearly constant time and doesn't exhibit much variation for different values of N. Let's change that...

  2. Instead of starting at floor(sqrt(N)) and decrementing, we can start at 1 and increment. This saves 4 bytes.

    li{W):W;:N4md!}g;0_{;)__*N\-[{_mqi__*@\-}3*])}g+2W#f*p
    
  3. Instead of expressing N as 4**a * b, we can express it as p**(2a) * b – where p is the smallest prime factor of N – to save 1 more byte.

    li_mF0=~2/#:J_*/:N!_{;)__*N\-[{_mqi__*@\-}3*])}g+Jf*p
    
  4. The previous modification allows us to slightly change the implementation (without touching the algorithm itself): Instead of dividing N by p**(2a) and multiply the solution by p**a, we can directly restrict the possible solutions to multiples of p**a. This saves 2 more bytes.

    li:NmF0=~2/#:J!_{;J+__*N\-[{_mqJ/iJ*__*@\-}3*])}g+`
    
  5. Not restricting the first integer to multiples of p**a saves an additional byte.

    li:NmF0=~2/#:J(_{;)__*N\-[{_mqJ/iJ*__*@\-}3*])}g+`
    

Final algorithm

  1. Find a and b such that N = p**(2a) * b, where b is not a multiple of p**2 and p is the smallest prime factor of N.

  2. Set j = p**a.

  3. Set k = floor(sqrt(N - j**2) / A) * A.

  4. Set l = floor(sqrt(N - j**2 - k**2) / A) * A.

  5. Set m = floor(sqrt(N - j**2 - k**2 - l**2) / A) * A.

  6. If N - j**2 - k**2 - l**2 - m**2 > 0, set j = j + 1 and go back to step 3.

This can be implemented as follows:

li:N          " Read an integer from STDIN and save it in “N”.                        ";
mF            " Push the factorization of “N”. Result: [ [ p1 a1 ] ... [ pn an ] ]    ";
0=~           " Push “p1” and “a1”. “p1” is the smallest prime divisor of “N”.        ";
2/#:J         " Compute p1**(a1/2) and save the result “J”.                           ";
(_            " Undo the first two instructions of the loop.                          ";
{             "                                                                       ";
  ;)_         " Pop and discard. Increment “J” and duplicate.                         ";
  _*N\-       " Compute N - J**2.                                                     ";
  [{          "                                                                       ";
    _mqJ/iJ*  " Compute K = floor(sqrt(N - J**2)/J)*J.                                ";
    __*@      " Duplicate, square and rotate. Result: K   K**2   N - J**2             ";
    \-        " Swap and subtract. Result: K   N - J**2 - K**2                        ";
  }3*]        " Do the above three times and collect in an array.                     ";
  )           " Pop the array. Result: N - J**2 - K**2 - L**2 - M**2                  ";
}g            " If the result is zero, break the loop.                                ";
+p            " Unshift “J” in [ K L M ] and print a string representation.           ";

Benchmarks

I've run all 5 versions over all positive integers up to 999,999,999 on my Intel Core i7-3770, measured execution time and counted the iterations required to find a solution.

The following table shows the average number of iterations and execution time for a single integer:

Version               |    1    |    2    |    3    |    4    |    5
----------------------+---------+---------+---------+---------+---------
Number of iterations  |  4.005  |  28.31  |  27.25  |  27.25  |  41.80
Execution time [µs]   |  6.586  |  39.69  |  55.10  |  63.99  |  88.81
  1. At only 4 iterations and 6.6 micro​seconds per integer, primo's algorithm is incredibly fast.

  2. Starting at floor(sqrt(N)) makes more sense, since this leaves us with smaller values for the sum of the remaining three squares. As expected, starting at 1 is a lot slower.

  3. This is a classical example of a badly implemented good idea. To actually reduce the code size, we rely on mF, which factorizes the integer N. Although version 3 requires less iterations than version 2, it is a lot slower in practice.

  4. Although the algorithm does not change, version 4 is a lot slower. This is because it performs an additional floating point division and an integer multiplication in each iteration.

  5. For the input N = p**(2a) ** b, algorithm 5 will require (k - 1) * p**a + 1 iterations, where k is the number of iterations algorithm 4 requires. If k = 1 or a = 0, this makes no difference.

    However, any input of the form 4**a * (4**c * (8 * d + 7) + 1) may perform quite badly. For the starting value j = p**a, N - 4**a = 4**(a + c) * (8 * d + 7), so it cannot be expressed as a sum of three squares. Thus, k > 1 and at least p**a iterations are required.

    Thankfully, primo's original algorithm is incredibly fast and N < 1,000,000,000. The worst case I could find by hand is 265,289,728 = 4**10 * (4**1 * (7 * 8 + 7) + 1), which requires 6,145 iterations. Execution time is less than 300 ms on my machine. On average, this version is 13.5 times slower than the implementation of primo's algorithm.

Dennis

Posted 2014-05-17T23:58:15.577

Reputation: 196 637

"Instead of expressing N as 4**a * b, we can express it as p**(2a) * b." This is actually an improvement. I would have liked to have included this, but it was a lot longer (the ideal is to find the largest perfect square factor). "Starting with 1 and incrementing saves 4 bytes." This is definitely slower. The runtime for any given range is 4-5 times as long. "All positive integers up to 999,999,999 took 24.67 hours, giving an average execution time of 0.0888 milliseconds per integer." Perl only took 2.5 hours to crunch the whole range, and the Python translation is 10x faster ;) – primo – 2014-05-25T08:43:51.510

@primo: Yes, you're right. Dividing by p**a is an improvement, but it's a small one. Dividing by the largest perfect square factor makes a big difference when starting from 1; it's still an improvement when starting from the integer part of the square root. Implementing it would cost only two more bytes. The abysmal execution time seems to be due to my unimprovements, not CJam. I'll rerun the tests for all algorithms (including the one you proposed), counting iterations rather than measuring wall time. Let's see how long that takes... – Dennis – 2014-05-25T16:33:38.807

Finding the largest square factor only costs 2 additional bytes?! What kind of sorcery is this? – primo – 2014-05-26T04:49:07.877

@primo: If the integer is on the stack, 1\ swaps it with 1 (accumulator), mF pushes its factorization and {~2/#*}/ raises every prime factor to its exponent divided by two, then multiplies it with the accumulator. For the direct implementation of your algorithm, that only adds 2 bytes. The small difference is mainly due to the awkward way I had to find the exponent of 4, since CJam doesn't (seem to) have a while loop... – Dennis – 2014-05-26T06:24:38.313

Anyway, the benchmark finished. The total number of iterations required to factorize all 1,000,000 integers without finding the largest square factor is 4,004,829,417, with an execution time of 1.83 hours. Dividing by the largest square factor reduces the iteration count to 3,996,724,799, but it increases the time to 6.7 hours. Looks like factorizing takes a lot more time than finding the squares... – Dennis – 2014-05-26T06:25:12.737

Complete integer factorization has always been a difficult problem. I suppose this means that since removing factors of four is sufficient - and fast - it's also the best in practice. I would have thought that the improvement would have been more than 0.2%, though... But still, 4 iterations on average, that's what I call amortized O(1) ;) – primo – 2014-05-26T06:36:01.047

7

Mathematica 61 66 51

Three methods are shown. Only the first approach meets the time requirement.


1-FindInstance (51 char)

This returns a single solution the equation.

FindInstance[a^2 + b^2 + c^2 + d^2 == #, {a, b, c, d}, Integers] &

Examples and timings

FindInstance[a^2 + b^2 + c^2 + d^2 == 123456789, {a, b, c, d}, Integers] // AbsoluteTiming

{0.003584, {{a -> 2600, b -> 378, c -> 10468, d -> 2641}}}

FindInstance[a^2 + b^2 + c^2 + d^2 == #, {a, b, c, d}, Integers] &[805306368]

{0.004437, {{a -> 16384, b -> 16384, c -> 16384, d -> 0}}}


2-IntegerPartitions

This works also, but is too slow to meet the speed requirement.

f@n_ := Sqrt@IntegerPartitions[n, {4}, Range[0, Floor@Sqrt@n]^2, 1][[1]]

Range[0, Floor@Sqrt@n]^2 is the set of all squares less than the square root of n (the largest possible square in the partition).

{4} requires the integer partitions of n consist of 4 elements from the above-mentioned set of squares.

1, within the function IntegerPartitions returns the first solution.

[[1]] removes the outer braces; the solution was returned as a set of one element.


f[123456]

{348, 44, 20, 4}


3-PowerRepresentations

PowerRepresentations returns all of the solutions to the 4 squares problem. It can also solve for sums of other powers.

PowersRepresentations returns, in under 5 seconds, the 181 ways to express 123456789 as the sum of 4 squares:

n= 123456;
PowersRepresentations[n, 4, 2] //AbsoluteTiming

sols

However, it is far too slow for other sums.

DavidC

Posted 2014-05-17T23:58:15.577

Reputation: 24 524

Wow, Mathematica does the brute force fast. Is IntegerPartitions doing something much more clever than trying every combination, like DFT convolution on the sets? The specs ask for the numbers, by the way, not their squares. – xnor – 2014-05-21T11:31:50.843

I think Mathematica uses brute force, but probably has optimized IntegerPartitions. As you can see from the timings, the speed varies greatly depending upon whether the first (largest) number is close to the square root of n. Thanks for catching the spec violation in the earlier version. – DavidC – 2014-05-21T11:44:44.050

Could you benchmark f[805306368]? Without dividing by powers of 4 first, my solution takes 0.05 s for 999999999; I've aborted the benchmark for 805306368 after 5 minutes... – Dennis – 2014-05-21T13:32:51.137

f[805306368] returns {16384, 16384, 16384} after 21 minutes. I used {3} in place of {4}. The attempt to solve it with a sum of 4 non-zero squares was unsuccessful after several hours of running. – DavidC – 2014-05-22T18:41:07.933

I don't have access to Mathematica, but from what I've read in the documentation center, IntegerPartitions[n,4,Range[Floor@Sqrt@n]^2 should work as well. However, I don't think you should use method 1 for your score, since it doesn't comply with the time limit specified in the question. – Dennis – 2014-05-22T22:25:42.743

Ok. I opted for the routine that easily satisfies the speed requirement. – DavidC – 2014-05-22T23:28:18.527

7

FRACTRAN: 156 98 fractions

Since this is a classic number theory problem, what better way to solve this than to use numbers!

37789/221 905293/11063 1961/533 2279/481 57293/16211 2279/611 53/559 1961/403 53/299 13/53 1/13 6557/262727 6059/284321 67/4307 67/4661 6059/3599 59/83 1/59 14279/871933 131/9701 102037079/8633 14017/673819 7729/10057 128886839/8989 13493/757301 7729/11303 89/131 1/89 31133/2603 542249/19043 2483/22879 561731/20413 2483/23701 581213/20687 2483/24523 587707/21509 2483/24797 137/191 1/137 6215941/579 6730777/965 7232447/1351 7947497/2123 193/227 31373/193 23533/37327 5401639/458 229/233 21449/229 55973/24823 55973/25787 6705901/52961 7145447/55973 251/269 24119/251 72217/27913 283/73903 281/283 293/281 293/28997 293/271 9320827/58307 9831643/75301 293/313 28213/293 103459/32651 347/104807 347/88631 337/347 349/337 349/33919 349/317 12566447/68753 13307053/107143 349/367 33197/349 135199/38419 389/137497 389/119113 389/100729 383/389 397/383 397/39911 397/373 1203/140141 2005/142523 2807/123467 4411/104411 802/94883 397/401 193/397 1227/47477 2045/47959 2863/50851 4499/53743 241/409 1/241 1/239

Takes in input of the form 2n × 193 and outputs 3a × 5b × 7c × 11d. Might run in 3 minutes if you have a really good interpreter. Maybe.

...okay, not really. This seemed to be such a fun problem to do in FRACTRAN that I had to try it. Obviously, this isn't a proper solution to the question as it doesn't make the time requirements (it brute forces) and it's barely even golfed, but I thought I'd post this here because it's not every day that a Codegolf question can be done in FRACTRAN ;)

Hint

The code is equivalent to the following pseudo-Python:

a, b, c, d = 0, 0, 0, 0

def square(n):
    # Returns n**2

def compare(a, b):
    # Returns (0, 0) if a==b, (1, 0) if a<b, (0, 1) if a>b

def foursquare(a, b, c, d):
    # Returns square(a) + square(b) + square(c) + square(d)

while compare(foursquare(a, b, c, d), n) != (0, 0):
    d += 1

    if compare(c, d) == (1, 0):
        c += 1
        d = 0

    if compare(b, c) == (1, 0):
        b += 1
        c = 0
        d = 0

    if compare(a, b) == (1, 0):
        a += 1
        b = 0
        c = 0
        d = 0

Sp3000

Posted 2014-05-17T23:58:15.577

Reputation: 58 729

7

Perl - 116 bytes 87 bytes (see update below)

#!perl -p
$.<<=1,$_>>=2until$_&3;
{$n=$_;@a=map{$n-=$a*($a-=$_%($b=1|($a=0|sqrt$n)>>1));$_/=$b;$a*$.}($j++)x4;$n&&redo}
$_="@a"

Counting the shebang as one byte, newlines added for horizontal sanity.

Something of a combination submission.

The average (worst?) case complexity seems to be O(log n) O(n0.07). Nothing I've found runs slower than 0.001s, and I've checked the entire range from 900000000 - 999999999. If you find anything that takes significantly longer than that, ~0.1s or more, please let me know.


Sample Usage

$ echo 123456789 | timeit perl four-squares.pl
11110 157 6 2

Elapsed Time:     0:00:00.000

$ echo 1879048192 | timeit perl four-squares.pl
32768 16384 16384 16384

Elapsed Time:     0:00:00.000

$ echo 999950883 | timeit perl four-squares.pl
31621 251 15 4

Elapsed Time:     0:00:00.000

The final two of these seem to be worst case scenerios for other submissions. In both instances, the solution shown is quite literally the very first thing checked. For 123456789, it's the second.

If you want to test a range of values, you can use the following script:

use Time::HiRes qw(time);

$t0 = time();

# enter a range, or comma separated list here
for (1..1000000) {
  $t1 = time();
  $initial = $_;
  $j = 0; $i = 1;
  $i<<=1,$_>>=2until$_&3;
  {$n=$_;@a=map{$n-=$a*($a-=$_%($b=1|($a=0|sqrt$n)>>1));$_/=$b;$a*$i}($j++)x4;$n&&redo}
  printf("%d: @a, %f\n", $initial, time()-$t1)
}
printf('total time: %f', time()-$t0);

Best when piped to a file. The range 1..1000000 takes about 14s on my computer (71000 values per second), and the range 999000000..1000000000 takes about 20s (50000 values per second), consistent with O(log n) average complexity.


Update

Edit: It turns out that this algorithm is very similar to one that has been used by mental calculators for at least a century.

Since originally posting, I have checked every value on the range from 1..1000000000. The 'worst case' behavior was exhibited by the value 699731569, which tested a grand total of 190 combinations before arriving at a solution. If you consider 190 to be a small constant - and I certainly do - the worst case behavior on the required range can be considered O(1). That is, as fast as looking up the solution from a giant table, and on average, quite possibly faster.

Another thing though. After 190 iterations, anything larger than 144400 hasn't even made it beyond the first pass. The logic for the breadth-first traversal is worthless - it's not even used. The above code can be shortened quite a bit:

#!perl -p
$.*=2,$_/=4until$_&3;
@a=map{$=-=$%*($%=$=**.5-$_);$%*$.}$j++,(0)x3while$=&&=$_;
$_="@a"

Which only performs the first pass of the search. We do need to confirm that there aren't any values below 144400 that needed the second pass, though:

for (1..144400) {
  $initial = $_;

  # reset defaults
  $.=1;$j=undef;$==60;

  $.*=2,$_/=4until$_&3;
  @a=map{$=-=$%*($%=$=**.5-$_);$%*$.}$j++,(0)x3while$=&&=$_;

  # make sure the answer is correct
  $t=0; $t+=$_*$_ for @a;
  $t == $initial or die("answer for $initial invalid: @a");
}

In short, for the range 1..1000000000, a near-constant time solution exists, and you're looking at it.


Updated Update

@Dennis and I have made several improvements this algorithm. You can follow the progress in the comments below, and subsequent discussion, if that interests you. The average number of iterations for the required range has dropped from just over 4 down to 1.229, and the time needed to test all values for 1..1000000000 has been improved from 18m 54s, down to 2m 41s. The worst case previously required 190 iterations; the worst case now, 854382778, needs only 21.

The final Python code is the following:

from math import sqrt

# the following two tables can, and should be pre-computed

qqr_144 = set([  0,   1,   2,   4,   5,   8,   9,  10,  13,
                16,  17,  18,  20,  25,  26,  29,  32,  34,
                36,  37,  40,  41,  45,  49,  50,  52,  53,
                56,  58,  61,  64,  65,  68,  72,  73,  74,
                77,  80,  81,  82,  85,  88,  89,  90,  97,
                98, 100, 101, 104, 106, 109, 112, 113, 116,
               117, 121, 122, 125, 128, 130, 133, 136, 137])

# 10kb, should fit entirely in L1 cache
Db = []
for r in range(72):
  S = bytearray(144)
  for n in range(144):
    c = r

    while True:
      v = n - c * c
      if v%144 in qqr_144: break
      if r - c >= 12: c = r; break
      c -= 1

    S[n] = r - c
  Db.append(S)

qr_720 = set([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121,
              144, 145, 160, 169, 180, 196, 225, 241, 244, 256, 265, 289,
              304, 324, 340, 361, 369, 385, 400, 409, 436, 441, 481, 484,
              496, 505, 529, 544, 576, 580, 585, 601, 625, 640, 649, 676])

# 253kb, just barely fits in L2 of most modern processors
Dc = []
for r in range(360):
  S = bytearray(720)
  for n in range(720):
    c = r

    while True:
      v = n - c * c
      if v%720 in qr_720: break
      if r - c >= 48: c = r; break
      c -= 1

    S[n] = r - c
  Dc.append(S)

def four_squares(n):
  k = 1
  while not n&3:
    n >>= 2; k <<= 1

  odd = n&1
  n <<= odd

  a = int(sqrt(n))
  n -= a * a
  while True:
    b = int(sqrt(n))
    b -= Db[b%72][n%144]
    v = n - b * b
    c = int(sqrt(v))
    c -= Dc[c%360][v%720]
    if c >= 0:
      v -= c * c
      d = int(sqrt(v))

      if v == d * d: break

    n += (a<<1) - 1
    a -= 1

  if odd:
    if (a^b)&1:
      if (a^c)&1:
        b, c, d = d, b, c
      else:
        b, c = c, b

    a, b, c, d = (a+b)>>1, (a-b)>>1, (c+d)>>1, (c-d)>>1

  a *= k; b *= k; c *= k; d *= k

  return a, b, c, d

This uses two pre-computed correction tables, one 10kb in size, the other 253kb. The code above includes the generator functions for these tables, although these should probably be computed at compile time.

A version with more modestly sized correction tables can be found here: http://codepad.org/1ebJC2OV This version requires an average of 1.620 iterations per term, with a worst case of 38, and the entire range runs in about 3m 21s. A little bit of time is made up for, by using bitwise and for b correction, rather than modulo.


Improvements

Even values are more likely to produce a solution than odd values.
The mental calculation article linked to previously notes that if, after removing all factors of four, the value to be decomposed is even, this value can be divided by two, and the solution reconstructed:

Although this might make sense for mental calculation (smaller values tend to be easier to compute), it doesn't make much sense algorithmically. If you take 256 random 4-tuples, and examine the sum of the squares modulo 8, you will find that the values 1, 3, 5, and 7 are each reached on average 32 times. However, the values 2 and 6 are each reached 48 times. Multiplying odd values by 2 will find a solution, on average, in 33% less iterations. The reconstruction is the following:

Care needs to be taken that a and b have the same parity, as well as c and d, but if a solution was found at all, a proper ordering is guaranteed to exist.

Impossible paths don't need to be checked.
After selecting the second value, b, it may already be impossible for a solution to exist, given the possible quadratic residues for any given modulo. Instead of checking anyway, or moving onto the next iteration, the value of b can be 'corrected' by decrementing it by the smallest amount that could possibly lead to a solution. The two correction tables store these values, one for b, and the other for c. Using a higher modulo (more accurately, using a modulo with relatively fewer quadratic residues) will result in a better improvement. The value a doesn't need any correction; by modifying n to be even, all values of a are valid.

primo

Posted 2014-05-17T23:58:15.577

Reputation: 30 891

1This is incredible! The final algorithm is probably the simplest of all the answers, yet 190 iterations are all it takes... – Dennis – 2014-05-24T05:38:53.703

@Dennis I would be very suprised if it hasn't been mentioned elsewhere. It seems too simple to have been over-looked. – primo – 2014-05-24T15:12:00.753

>

  • I'm curious: Did any of the test values in your complexity analysis require the breadth-first traversal? 2. The Wikipedia article you linked to is a little confusing. It mentions the Rabin-Shallit algorithm, but provides an example for an entirely different one. 3. It would be interesting to see when exactly the Rabin-Shallit algorithm would outperform yours, I'd imagine the primality tests are rather expensive in practice.
  • < – Dennis – 2014-05-26T20:21:14.383

    >

  • Not one. 2. This is where I got my information (i.e. that this algorithm exists); I haven't seen the analysis, or even read the paper. 3. The curve becomes so steep at around 1e60, that it really wouldn't matter how 'slow' the O(log²n) is, it will still cross at about that point.
  • < – primo – 2014-05-27T00:17:05.170

    1

    The second link in the question explains how to implement Rabin-Shallit, but it doesn't talk about the complexity. This answer on MathOverflow gives a nice summary of the paper. By the way, you rediscovered an algorithm used by Gottfried Ruckle in 1911 (link).

    – Dennis – 2014-05-27T02:39:58.933

    if N/2 = a²+b²+c²+d² then N = (a+b)²+(a-b)²+(c+d)²+(c-d)². Interesting. I wonder how big of a difference this would make? Seems like it might be fairly significant, as one quarter of the range has an odd factor of 2. – primo – 2014-05-27T02:54:23.147

    I did a short sampling (10 million integers), and the algorithm seems to perform much better for even numbers than for odd ones (30 % less iterations). By the way, (log n)² is less ambiguous/confusing, but usually log²n means just that. – Dennis – 2014-05-27T03:30:05.183

    It actually requires more iterations on average, which is disappointing. Factoring out 13 (3a+2b)²+(2a-3b)²+... and 17 (4a+b)²+(a-4b)²+... does give a meager improvement, but not enough to warrant use. It's interesting that even numbers perform better than odd... perhaps multiplying odd values by two could produce an improvement ;) – primo – 2014-05-27T03:34:32.917

    Multiplying by 2 is an improvement, but only if the original number was divisible by four (otherwise, you need to ensure even parity of a+b and c+d, which seems to eliminate too many combinations): http://codepad.org/yy1oBCqW

    – primo – 2014-05-27T04:04:17.033

    Well, the general idea still works: If (a,b,c,d) is a solution for 2N and the four integers are ordered by parity, 1/2 * (a + b, a - b, c + d, c - d) is a solution for N. Now that I think about it, it makes perfect sense: If N ≡ 2 mod 4, since a² ≡ 0 mod 4 or a² ≡ 1 mod 4, N - a² ≡ 2 mod 4 or N - a² ≡ 1 mod 4. This avoid both possible cases in which N - a² is not a sum of two squares. Another way of achieving the same might be the following: Force a to be even if N ≡ 1 mod 4 and to be odd if N ≡ 3 mod 4. – Dennis – 2014-05-27T04:08:17.593

    Even better ;) http://codepad.org/njnixbTd

    – primo – 2014-05-27T04:40:33.527

    Multiplying all odd values by 2 requires even less iterations, but sorting the array seems to take too much time. http://codepad.org/0QovLcAo There's probably a better way. I've never used Python before...

    – Dennis – 2014-05-27T05:23:31.547

    Function calls are relatively expensive in Python. But in this case, we don't need a full sort (with a specialized comparator function, no less), we can determine a proper ordering in no more than two comparisons: http://codepad.org/HklDWPGe

    – primo – 2014-05-27T05:36:43.387

    Much better, but still a tad slower than the other approach. – Dennis – 2014-05-27T05:50:04.723

    For values less than 10^9, it is a bit slower (presumably because most values would have found a solution on the first or second iteration anyway). For 10^10 and above, it's noticably faster. I'm curious, though, would it be possible to further specialize between n ≡ 1 (mod 4) and n ≡ 3 (mod 4)? – primo – 2014-05-27T05:57:17.057

    Let us continue this discussion in chat.

    – Dennis – 2014-05-27T15:44:30.950

    6

    Python 3 (177)

    N=int(input())
    k=1
    while N%4<1:N//=4;k*=2
    n=int(N**.5)
    R=range(int(2*n**.5)+1)
    print([(a*k,b*k,c*k,d*k)for d in R for c in R for b in R for a in[n,n-1]if a*a+b*b+c*c+d*d==N][0])
    

    After we reduce the input N to be not divisible by 4, it must be expressible as a sum of four squares where one of them is either the largest possible value a=int(N**0.5) or one less than that, leaving only a small remainder for sum of the three other squares to take care of. This greatly reduces the search space.

    Here's a proof later this code always finds a solution. We wish to find an a so that n-a^2 is the sum of three squares. From Legendre's Three-Square Theorem, a number is the sum of three squares unless it is the form 4^j(8*k+7). In particular, such numbers are either 0 or 3 (modulo 4).

    We show that no two consecutive a can make the leftover amount N-a^2 have such a shape for both consecutive values.. We can do so by simply making a table of a and N modulo 4, noting that N%4!=0 because we've extracted all powers of 4 out of N.

      a%4= 0123
          +----
         1|1010
    N%4= 2|2121  <- (N-a*a)%4
         3|3232
    

    Because no two consecutive a give (N-a*a)%4 in [0,3], one of them is safe to use. So, we greedily use the largest possible n with n^2<=N, and n-1. Since N<(n+1)^2, the remainder N-a^2 to be represented as a sum of three squares is at most (n+1)^2 -(n-1)^2, which equals 4*n. So, it suffices to check only values up to 2*sqrt(n), which is exactly the range R.

    One could further optimize running time by stopping after a single solution, computing rather than iterating for the last value d, and searching only among values b<=c<=d. But, even without these optimizations, the worst instance I could find finished in 45 seconds on my machine.

    The chain of "for x in R" is unfortunate. It can probably be shortened by string substitution or replacement by iterating over a single index that encodes (a,b,c,d). Importing itertools turned out not worth it.

    Edit: Changed to int(2*n**.5)+1 from 2*int(n**.5)+2 to make argument cleaner, same character count.

    xnor

    Posted 2014-05-17T23:58:15.577

    Reputation: 115 687

    This doesn't work for me... 5 => (2, 1, 0, 0) – Harry Beadle – 2014-05-18T10:59:36.490

    Strange, it works for me: I get 5 => (2, 1, 0, 0) running on Ideone 3.2.3 or in Idle 3.2.2. What do you get? – xnor – 2014-05-18T19:58:41.113

    1@xnor BritishColour gets 5 => (2, 1, 0, 0). Did you even read the comment? (Now we have 3 comments in a row that have that code snippet. Can we keep the streak going?) – Justin – 2014-05-19T05:56:04.700

    @Quincunx If we are to decipher 5 => (2, 1, 0, 0), it means 2^2 + 1^2 + 0^2 + 0^2 = 5. So, yes, we can. – HostileFork says dont trust SE – 2014-05-19T06:03:25.137

    @Dr.Rebmu What are you saying about 5 => (2, 1, 0, 0)? I understood that's what that means. Did you mean that we can keep the streak going? – Justin – 2014-05-19T06:08:04.290

    @Quincunx As with many things, when you ask about my quotation of 5 => (2, 1, 0, 0) a lot of the discernment of intention is left up to the reader. – HostileFork says dont trust SE – 2014-05-19T06:13:44.647

    1Quincunx, I read @BritishColour's comment, and as far as I can see, 5 => (2, 1, 0, 0) is correct. The examples in the question consider 0^2=0 to be a valid square number. Therefore I interpreted (as I think xnor did) that British Colour got something else. British colour, as you hav not responded again, can we assume that you do in fact get 2,1,0,0? – Level River St – 2014-05-19T10:41:47.767

    Sorry, I ran it again and it worked, don't know what happened then, probably a Copy/Paste fail... – Harry Beadle – 2014-05-19T12:01:24.920

    @xnor You can save 2 characters at the cost of "only" a factor of 1.5^3 slowdown by changing range(2*int(n**.5)+2) to range(3*int(n**.5)). Also, you could save a bunch of characters by switching to Python 2.7. – isaacg – 2014-05-20T08:07:49.193

    @isaacg The 3* improvement to the range finished at just over 3 minutes on my computer on the worst-case input of 999950883, so I'm not sure if I can include it. – xnor – 2014-05-20T23:01:23.200

    5

    CJam, 91 90 74 71 bytes

    q~{W):W;:N4md!}gmqi257:B_**_{;)_[Bmd\Bmd]_N\{_*-}/mq_i@+\1%}g{2W#*}%`\;
    

    Compact, but slower than my other approach.

    Try it online! Paste the Code, type the desired integer in Input and click Run.

    Background

    This post started as a 99 byte GolfScript answer. While there was still room for improvement, GolfScript lacks built-in sqrt function. I kept the GolfScript version until revision 5, since it was very similar to the CJam version.

    However, the optimizations since revision 6 require operators that are not available in GolfScript, so instead of posting separate explanations for both languages, I decided to drop the less competitive (and much slower) version.

    The implemented algorithm computes the numbers by brute force:

    1. For input m, find N and W such that m = 4**W * N.

    2. Set i = 257**2 * floor(sqrt(N/4)).

    3. Set i = i + 1.

    4. Find integers j, k, l such that i = 257**2 * j + 257 * k + l, where k, l < 257.

    5. Check if d = N - j**2 - k**2 - l**2 is a perfect square.

    6. If it isn't, and go back to step 3.

    7. Print 2**W * j, 2**W * k, 2**W * l, 2**W * sqrt(m).

    Examples

    $ TIME='\n%e s' time cjam lagrange.cjam <<< 999999999
    [27385 103 15813 14]
    0.46 s
    $ TIME='\n%e s' time cjam lagrange.cjam <<< 805306368
    [16384 16384 0 16384]
    0.23 s
    

    Timings correspond to an Intel Core i7-4700MQ.

    How it works

    q~              " Read and interpret the input. ";
    {
      W):W;         " Increment “W” (initially -1). ";
      :N            " Save the integer on the stack in “N”. ';
      4md!          " Push N / 4 and !(N % 4). ";
    }g              " If N / 4 is an integer, repeat the loop.
    mqi             " Compute floor(sqrt(N/4)). ";
    257:B_**        " Compute i = floor(sqrt(N)) * 257**2. ";
    _               " Duplicate “i” (dummy value). ";
    {               " ";
      ;)_           " Pop and discard. Increment “i”. ";
      [Bmd\Bmd]     " Push [ j k l ], where i = 257**2 * j + 257 * k + l and k, l < 257. ";
      _N\           " Push “N” and swap it with a copy of [ j k l ]. ";
      {_*-}/        " Compute m = N - j**2 - k**2 - l**2. ";
      mq            " Compute sqrt(m). ";
      _i            " Duplicate sqrt(m) and compute floor(sqrt(m)). ";
      @+\           " Form [ j k l floor(sqrt(m)) ] and swap it with sqrt(m). ";
      1%            " Check if sqrt(m) is an integer. ";
    }g              " If it is, we have a solution; break the loop. ";
    {2W#*}%         " Push 2**W * [ j k l sqrt(m) ]. ";
    `\;             " Convert the array into a string and discard “i”. ";
    

    Dennis

    Posted 2014-05-17T23:58:15.577

    Reputation: 196 637

    2

    C, 228

    This is based on the algorithm on the Wikipedia page, which is an O(n) brute-force.

    n,*l,x,y,i;main(){scanf("%d",&n);l=calloc(n+1,8);
    for(x=0;2*x*x<=n;x++)for(y=x;(i=x*x+y*y)<=n;y++)l[2*i]=x,l[2*i+1]=y;
    for(x=0,y=n;;x++,y--)if(!x|l[2*x+1]&&l[2*y+1]){
    printf("%d %d %d %d\n",l[2*x],l[2*x+1],l[2*y],l[2*y+1]);break;}}
    

    ephemient

    Posted 2014-05-17T23:58:15.577

    Reputation: 1 601

    2

    GolfScript, 133 130 129 bytes

    ~{.[4*{4/..4%1$!|!}do])\}:r~,(2\?:f;{{..*}:^~4-1??n*,}:v~)..
    {;;(.^3$\-r;)8%!}do-1...{;;;)..252/@252%^@^@+4$\-v^@-}do 5$]{f*}%-4>`
    

    Fast, but lengthy. The newline can be removed.

    Try it online. Note that the online interpreter has a 5 second time limit, so it might not work for all numbers.

    Background

    The algorithm takes advantage of Legendre's three-square theorem, which states that every natural number n that is not of the form

                                                                       n = 4**a * (8b + 7)

    can be expressed as the sum of three squares.

    The algorithm does the following:

    1. Express the number as 4**i * j.

    2. Find the largest integer k such that k**2 <= j and j - k**2 satisfies the hypothesis of Legendre's three-square theorem.

    3. Set i = 0.

    4. Check if j - k**2 - (i / 252)**2 - (i % 252)**2 is a perfect square.

    5. If it isn't, increment i and go back to step 4.

    Examples

    $ TIME='%e s' time golfscript legendre.gs <<< 0
    [0 0 0 0]
    0.02 s
    $ TIME='%e s' time golfscript legendre.gs <<< 123456789
    [32 0 38 11111]
    0.02 s
    $ TIME='%e s' time golfscript legendre.gs <<< 999999999
    [45 1 217 31622]
    0.03 s
    $ TIME='%e s' time golfscript legendre.gs <<< 805306368
    [16384 0 16384 16384]
    0.02 s
    

    Timings correspond to an Intel Core i7-4700MQ.

    How it works

    ~              # Interpret the input string. Result: “n”
    {              #
      .            # Duplicate the topmost stack item.
      [            #
        4*         # Multiply it by four.
        {          #
          4/       # Divide by four.
          ..       # Duplicate twice.
          4%1$     # Compute the modulus and duplicate the number.
          !|!      # Push 1 if both are truthy.
        }do        # Repeat if the number is divisible by four and non-zero.
      ]            # Collect the pushed values (one per iteration) into an array.
      )\           # Pop the last element from the array and swap it with the array.
    }:r~           # Save this code block as “r” and execute it.
    ,(2\?          # Get the length of the array, decrement it and exponentiate.
    :f;            # Save the result in “f”.
                   # The topmost item on the stack is now “j”, which is not divisible by 
                   # four and satisfies that n = f**2 * j.
    {              #
      {..*}:^~     # Save a code block to square a number in “^” and execute it.
      4-1??        # Raise the previous number to the power of 1/4.
                   # The two previous lines compute (x**2)**(1/4), which is sqrt(abs(x)).
      n*,          # Repeat the string "\n" that many times and compute its length.
                   # This casts to integer. (GolfScript doesn't officially support Rationals.)
    }:v~           # Save the above code block in “v” and execute it.
    )..            # Undo the first three instructions of the loop.
    {              #
       ;;(         # Discard two items from the stack and decrement.
       .^3$\-      # Square and subtract from “n”.
       r;)8%!      # Check if the result satisfies the hypothesis of the three-square theorem.
    }do            # If it doesn't, repeat the loop.
    -1...          # Push 0 (“i”) and undo the first four instructions of the loop.
    {              #
      ;;;)         # Discard two items from the stack and increment “i”.
      ..252/@252%  # Push the digits of “i” in base 252.
      ^@^@+4$\-    # Square both, add and subtract the result 
      v^@-         # Take square root, square and compare.
    }do            # If the difference is a perfect square, break the loop.
    5$]            # Duplicate the difference an collect the entire stack into an array.
    {f*}%          # Multiply very element of the array by “f”.
    -4>            # Reduce the array to its four last elements (the four numbers).
    `              # Convert the result into a string.
    

    Dennis

    Posted 2014-05-17T23:58:15.577

    Reputation: 196 637

    1I didn't understand j-k-(i/252)-(i%252). From your comments (I cant actually read the code), it looks like you mean j-k-(i/252)^2-(i%252)^2. BTW, the equivalent of j-k-(i/r)^2-(i%r)^2 where r=sqrt(k) may save a few characters (and seems to work without problems even for k=0 in my C program.) – Level River St – 2014-05-22T19:05:49.510

    @steveverrill: Yes, I made a mistake. Thank you for noticing. It should be j-k^2-(i/252)^2-(i%252)^2. I'm still waiting for the OP to clarify if 0 is a valid input or not. Your program gives 1414 -nan 6 4.000000 for input 0. – Dennis – 2014-05-22T19:18:32.653

    I'm talking about my new program using Legendre's theorem, which I haven't posted yet. It looks like it never calls the code with the % or / when I have the equivalent of k=0, which is why it's not causing problems. You'll see when I post it. Glad you got my old program running. Did you have the memory to build the full 2GB table in rev 1, and how long did it take? – Level River St – 2014-05-22T19:28:37.297

    Yeah, the C compiler can behave quite unexpectedly when optimizing. In GolfScript, 0/ => crash! :P I've run your rev 1 on my laptop (i7-4700MQ, 8 GiB RAM). On average, execution time is 18.5 seconds. – Dennis – 2014-05-22T19:33:14.193

    Wow is that 18.5 seconds including building the table? It takes over 2 minutes on my machine. I can see the problem is the Windows memory management. Rather than give the program the 2GB it needs straight away, it gives it in small chunks, so it must be doing a lot of unnecessary swapping until the full 2GB is allocated. Actually searching for the answer per user input is much faster, because by then the program doesn't have to go begging for memory. – Level River St – 2014-05-22T19:46:20.623

    @steveverrill: Yes, but that was with the default optimization level. With -Ofast, the entire process takes about 13 seconds. Fun fact: With -Ofast, you don't need -lm. – Dennis – 2014-05-22T22:31:52.223

    1

    Rev 1: C,190

    a,z,m;short s[15<<26];p(){m=s[a=z-a];printf("%d %f ",m,sqrt(a-m*m));}
    main(){m=31727;for(a=m*m;--a;s[z<m*m?z:m*m]=a%m)z=a/m*(a/m)+a%m*(a%m);scanf("%d",&z);for(;a*!s[a]||!s[z-a];a++);p();p();}
    

    This is even more memory hungry than rev 0. Same principle: build a table with a positive value for all possible sums of 2 squares (and zero for those numbers which are not sums of two squares), then search it.

    In this rev use an array of short instead of char to store the hits, so I can store the root of one of the pair of squares in the table instead of just a flag. This simplifies function p (for decoding the sum of 2 squares) considerably as there is no need for a loop.

    Windows has a 2GB limit on arrays. I can get round that with short s[15<<26] which is an array of 1006632960 elements, enough to comply with the spec. Unfortunately, the total program runtime size is still over 2GB and (despite tweaking the OS settings) I have not been able to run over this size (though it is theoretically possible.) The best I can do is short s[14<<26] (939524096 elements.) m*m must be strictly less than this (30651^2=939483801.) Nevertheless, the program runs perfectly and should work on any OS that does not have this restriction.

    Ungolfed code

    a,z,m;
    short s[15<<26];     
    p(){m=s[a=z-a];printf("%d %f ",m,sqrt(a-m*m));}      
    main(){       
     m=31727;             
     for(a=m*m;--a;s[z<m*m?z:m*m]=a%m)   //assignment to s[] moved inside for() is executed after the following statement. In this rev excessively large values are thrown away to s[m*m].
       z=a/m*(a/m)+a%m*(a%m);            //split a into high and low half, calculate h^2+l^2.                                  
     scanf("%d",&z); 
     for(;a*!s[a]||!s[z-a];a++);         //loop until s[a] and s[z-a] both contain an entry. s[0] requires special handling as s[0]==0, therefore a* is included to break out of the loop when a=0 and s[z] contains the sum of 2 squares.
     p();                                //print the squares for the first sum of 2 squares 
     p();}                               //print the squares for the 2nd sum of 2 squares (every time p() is called it does a=z-a so the two sums are exchanged.) 
    

    Rev 0 C,219

    a,z,i,m;double t;char s[1<<30];p(){for(i=t=.1;(m=t)-t;i++)t=sqrt(a-i*i);printf("%d %f ",i-1,t);}
    main(){m=1<<15;for(a=m*m;--a;){z=a/m*(a/m)+a%m*(a%m);s[z<m*m?z:0]=1;}scanf("%d",&z);for(;1-s[a]*s[z-a];a++);p();a=z-a;p();}
    

    This is a memory hungry beast. It takes a 1GB array, calculates all possible sums of 2 squares and stores a flag for each in the array. Then for the user input z, it searches the array for two sums of 2 squares a and z-a.

    the function p then reconsitutes the original squares that were used to make the sums of 2 squares a and z-a and prints them, the first of each pair as an integer, the second as a double (if it has to be all integers two more characters are needed,t > m=t.)

    The program takes a couple of minutes to build the table of sums of squares (I think this is due to memory management issues, I see the memory allocation going up slowly instead of jumping up as one might expect.) However once that is done it produces answers very quickly (if several numbers are to be calculated, the program from scanf onward can be put in a loop.

    ungolfed code

    a,z,i,m;
    double t;
    char s[1<<30];                              //handle numbers 0 up to 1073741823
    p(){
     for(i=t=.1;(m=t)-t;i++)t=sqrt(a-i*i);      //where a contains the sum of 2 squares, search until the roots are found
     printf("%d %f ",i-1,t);}                   //and print them. m=t is used to evaluate the integer part of t. 
    
    main(){       
     m=1<<15;                                   //max root we need is sqrt(1<<30);
     for(a=m*m;--a;)                            //loop m*m-1 down to 1, leave 0 in a
       {z=a/m*(a/m)+a%m*(a%m);s[z<m*m?z:0]=1;}  //split a into high and low half, calculate h^2+l^2. If under m*m, store flag, otherwise throw away flag to s[0]
     scanf("%d",&z);
     for(;1-s[a]*s[z-a];a++);                   //starting at a=0 (see above) loop until flags are found for sum of 2 squares of both (a) and (z-a)
     p();                                       //reconsitute and print the squares composing (a)
     a=z-a;                                     //assign (z-a) to a in order to...
     p();}                                      //reconsitute and print the squares composing (z-a)  
    

    Example output

    The first is per the question. The second was picked as a difficult one to search for. In this case the program has to search as far as 8192^2+8192^2=134217728, but only takes a few seconds once the table is built.

    123456789
    0 2.000000 3328 10601.000000
    
    805306368
    8192 8192.000000 8192 24576.000000
    

    Level River St

    Posted 2014-05-17T23:58:15.577

    Reputation: 22 049

    Shouldn't you add a prototype for sqrt? – edc65 – 2014-05-21T16:14:00.293

    @edc65 I'm using GCC compiler (which is for Linux, but I have Cygwin Linux environment installed on my Windows machine.) This means I don't need to put #include <stdio.h> (for scanf/printf) or #include <math.h> (for sqrt.) The compiler links the necessary libraries automatically. I have to thank Dennis for that (he told me on this question http://codegolf.stackexchange.com/a/26330/15599) Best golfing tip I ever had.

    – Level River St – 2014-05-21T16:19:26.013

    I was already wondering why Hunt the Wumpus appeared in the linked questions. :) By the way, I don't know what GCC uses on Windows, but the GNU linker does not link the math library automatically, with or without the include. To compile on Linux, you need the flag -lm – Dennis – 2014-05-22T04:53:48.110

    @Dennis that's interesting, it does include stdio and several other libraries, but not math even with the include? By which I understand if you put the compiler flag, you don't need the include anyway? Well it's working for me, so I'm not complaining, thanks again for the tip. BTW I'm hoping to post a completely different answer taking advantage of Legendre's theorem (but it will still use a sqrt.) – Level River St – 2014-05-22T08:00:25.963

    -lm affects the linker, not the compiler. gcc opts to not require the prototypes for functions it "knows", so it works with or without the includes. However, the header filesprovide only function prototypes, not the functions themselves. On Linux (but not Windows, apparently), the math library libm is not part of the standard libraries, so you have to instruct ld to link to it. – Dennis – 2014-05-22T14:35:16.797

    1

    Mathematica, 138 chars

    So it turns out that this produces negative and imaginary results for certain inputs as pointed out by edc65 (e.g., 805306368), so this isn't a valid solution. I'll leave it up for now, and maybe, if I really hate my time, I'll go back and try to fix it.

    S[n_]:=Module[{a,b,c,d},G=Floor@Sqrt@#&;a=G@n;b:=G[n-a^2];c:=G[n-a^2-b^2];d:=G[n-a^2-b^2-c^2];While[Total[{a,b,c,d}^2]!=n,a-=1];{a,b,c,d}]
    

    Or, unsquished:

    S[n_] := Module[{a, b, c, d}, G = Floor@Sqrt@# &;
     a = G@n;
     b := G[n - a^2];
     c := G[n - a^2 - b^2];
     d := G[n - a^2 - b^2 - c^2];
     While[Total[{a, b, c, d}^2] != n, a -= 1];
     {a, b, c, d}
    ]
    

    I didn't look too hard at the algorithms, but I expect this is the same idea. I just came up with the obvious solution and tweaked it until it worked. I tested it for all numbers between 1 and one billion and... it works. The test only takes about 100 seconds on my machine.

    The nice bit about this is that, since b, c, and d are defined with delayed assignments, :=, they don't have to be redefined when a is decremented. This saved a few extra lines I had before. I might golf it further and nest the redundant parts, but here's the first draft.

    Oh, and you run it as S@123456789 and you can test it with {S@#, Total[(S@#)^2]} & @ 123456789 or # == Total[(S@#)^2]&[123456789]. The exhaustive test is

    n=0;
    AbsoluteTiming@ParallelDo[If[e != Total[(S@e)^2], n=e; Abort[]] &, {e, 1, 1000000000}]
    n
    

    I used a Print[] statement before but that slowed it down a lot, even though it never gets called. Go figure.

    krs013

    Posted 2014-05-17T23:58:15.577

    Reputation: 201

    This is really clean! I'm surprised that it suffices to simply take every value but the the first as large as possible. For golfing, it's probably shorter to save n - a^2 - b^2 - c^2 as a variable and check that d^2 equals it. – xnor – 2014-05-20T23:00:13.827

    2Does it really work? What solution does it find for input 805306368? – edc65 – 2014-05-21T07:08:22.167

    S[805306368]={-28383, 536 I, 32 I, I}. Huh. That does produces 805306368 when you sum it, but obviously there is a problem with this algorithm. I guess I'll have to retract this for now; thanks for pointing that out... – krs013 – 2014-05-21T07:14:13.357

    2The numbers that fail all seem to be divisible by large powers of 2. Specifically, they seem to be of the form a * 4^(2^k) for k>=2, having extracted out all powers of 4 so that a isn't a multiple of 4 (but could be even). Moreover, each a is either 3 mod 4, or twice such a number. The smallest one is 192. – xnor – 2014-05-21T08:09:17.147

    1

    Haskell 123+3=126

    main=getLine>>=print.f.read
    f n=head[map(floor.sqrt)[a,b,c,d]|a<-r,b<-r,c<-r,d<-r,a+b+c+d==n]where r=[x^2|x<-[0..n],n>=x^2]
    

    Simple brute force over pre-calculated squares.

    It needs the -O compilation option (I added 3 chars for this). It takes less than 1 minute for the worst case 999950883.

    Only tested on GHC.

    lortabac

    Posted 2014-05-17T23:58:15.577

    Reputation: 761

    1

    C: 198 characters

    I can probably squeeze it down to just over 100 characters. What I like about this solution is the minimal amount of junk, just a plain for-loop, doing what a for-loop should do (which is to be crazy).

    i,a,b,c,d;main(n){for(scanf("%d",&n);a*a+b*b-n?a|!b?a*a>n|a<b?(--a,b=1):b?++b:++a:(a=b=0,--n,++i):c*c+d*d-i?c|!d?c*c>i|c<d?(--c,d=1):d?++d:++c:(a=b=c=d=0,--n,++i):0;);printf("%d %d %d %d",a,b,c,d);}
    

    And heavily prettified:

    #include <stdio.h>
    
    int n, i, a, b, c, d;
    
    int main() {
        for (
            scanf("%d", &n);
            a*a + b*b - n
                ? a | !b
                    ? a*a > n | a < b
                        ? (--a, b = 1)
                        : b
                            ? ++b
                            : ++a
                    : (a = b = 0, --n, ++i)
                : c*c + d*d - i
                    ? c | !d
                        ? c*c > i | c < d
                            ? (--c, d = 1)
                            : d
                                ? ++d
                                : ++c
                        : (a = b = c = d = 0, --n, ++i)
                    : 0;
        );
        printf("%d %d %d %d\n", a, b, c, d);
        return 0;
    }
    

    Edit: It's not fast enough for all input, but I will be back with another solution. I'll let this ternary operation mess stay as of now.

    Fors

    Posted 2014-05-17T23:58:15.577

    Reputation: 3 020

    1

    Rev B: C, 179

    a,b,c,d,m=1,n,q,r;main(){for(scanf("%d",&n);n%4<1;n/=4)m*=2;
    for(a=sqrt(n),a-=(3+n-a*a)%4/2;r=n-a*a-b*b-c*c,d=sqrt(r),d*d-r;c=q%256)b=++q>>8;
    printf("%d %d %d %d",a*m,b*m,c*m,d*m);}
    

    Thanks to @Dennis for the improvements. The rest of the answer below is not updated from rev A.

    Rev A: C,195

    a,b,c,d,n,m,q;double r=.1;main(){scanf("%d",&n);for(m=1;!(n%4);n/=4)m*=2;a=sqrt(n);a-=(3+n-a*a)%4/2;
    for(;(d=r)-r;q++){b=q>>8;c=q%256;r=sqrt(n-a*a-b*b-c*c);}printf("%d %d %d %d ",a*m,b*m,c*m,d*m);}
    

    Much faster than my other answer and with much less memory!

    This uses http://en.wikipedia.org/wiki/Legendre%27s_three-square_theorem. Any number not of the following form can be expressed as the sum of 3 squares (I call this the prohibited form):

    4^a*(8b+7), or equivalently 4^a*(8b-1)

    Note that all odd square numbers are of the form (8b+1) and all even square numbers are superficially of the form 4b. However this hides the fact that all even square numbers are of the form 4^a*(odd square)==4^a*(8b+1). As a result 2^x-(any square number < 2^(x-1)) for odd xwill always be of the prohibited form. Hence these numbers and their multiples are difficult cases, which is why so many of the programs here divide out powers of 4 as a first step.

    As stated in @xnor's answer, N-a*a cannot be of the prohibited form for 2 consecutive values of a. Below I present a simplified form of his table. In addition to the fact that after division by 4 N%4 cannot equal 0, note that there are only 2 possible values for (a*a)%4.

    (a*a)%4= 01
            +--
           1|10
      N%4= 2|21  <- (N-a*a)%4
           3|32
    

    So, we want to avoid values of (N-a*a) that may be of the prohibited form, namely those where (N-a*a)%4 is 3 or 0. As can be seen this cannot occur for the same N with both odd and even (a*a).

    So, my algorithm works like this:

    1. Divide out powers of 4
    2. Set a=int(sqrt(N)), the largest possible square
    3. If (N-a*a)%4= 0 or 3, decrement a (only once)
    4. Search for b and c such that N-a*a-b*b-c*c is a perfect square
    

    I particularly like the way I do step 3. I add 3 to N, so that the decrement is required if (3+N-a*a)%4 = 3 or 2. (but not 1 or 0.) Divide this by 2 and the whole job can be done by a fairly simple expression.

    Ungolfed code

    Note the single for loop q and use of division/modulo to derive the values of b and c from it. I tried using a as a divisor instead of 256 to save bytes, but sometimes the value of a was not right and the program hung, possibly indefinitely. 256 was the best compromise as I can use >>8 instead of /256 for the division.

    a,b,c,d,n,m,q;double r=.1;
    main(){
      scanf("%d",&n);
      for(m=1;!(n%4);n/=4)m*=2;
      a=sqrt(n);
      a-=(3+n-a*a)%4/2;
      for(;(d=r)-r;q++){b=q>>8;c=q%256;r=sqrt(n-a*a-b*b-c*c);}
      printf("%d %d %d %d ",a*m,b*m,c*m,d*m);}
    

    Output

    An interesting quirk is that if you input a square number, N-(a*a)=0. But the program detects that 0%4=0 and decrements to the next square down. As a result square number inputs are always decomposed into a group of smaller squares unless they are of the form 4^x.

    999999999
    31621 1 161 294
    
    805306368
    16384 0 16384 16384
    
    999950883
    31621 1 120 221
    
    1
    0 0 0 1
    
    2
    1 0 0 1
    
    5
    2 0 0 1
    
    9
    2 0 1 2
    
    25
    4 0 0 3
    
    36
    4 0 2 4
    
    49
    6 0 2 3
    
    81
    8 0 1 4
    
    121
    10 1 2 4
    

    Level River St

    Posted 2014-05-17T23:58:15.577

    Reputation: 22 049

    Amazing! 0.003 s for every input! You can get those 5 chars back: 1. Declare m=1 before main. 2. Execute scanf in the for statement. 3. Use float instead of double. 4. n%4<1 is shorter than !(n%4). 5. There's an obsolete space in printf's format string. – Dennis – 2014-05-22T23:19:45.507

    A few more suggestions. – Dennis – 2014-05-23T03:26:26.340

    Thanks for the tips! n-=a*a doesn't work, because a can be modified afterwards (it gives some wrong answers and hangs on a small number of cases, like 100+7=107.) I included all the rest. It would be nice to something to shorten the printf but I think the only answer is to change the language. The key to speed is to settle on a good value for a quickly. Written in C and with a search space of less than 256^2, this is probably the fastest program here. – Level River St – 2014-05-23T19:31:57.643

    Right, sorry. Shortening the printf statement seems difficult without using a macro or an array, which would add bulk elsewhere. Changing languages seems the "easy" way. Your approach would weigh 82 bytes in CJam. – Dennis – 2014-05-23T21:26:46.267

    0

    JavaScript - 175 191 176 173 chars

    Brute force, but fast.

    Edit Fast but not enough for some nasty input. I had to add a first step of reduction by multiplies of 4.

    Edit 2 Get rid of function, output inside the loop then force exit contition

    Edit 3 0 not a valid input

    v=(p=prompt)();for(m=1;!(v%4);m+=m)v/=4;for(a=-~(q=Math.sqrt)(v);a--;)for(w=v-a*a,b=-~q(w);b--;)for(x=w-b*b,c=-~q(x);c--;)(d=q(x-c*c))==~~d&&p([m*a, m*b, m*c, m*d],a=b=c='')
    

    Ungolfed:

    v = prompt();
    
    for (m = 1; ! (v % 4); m += m) 
    {
      v /= 4;
    }
    for (a = - ~Math.sqrt(v); a--;) /* ~ force to negative integer, changing sign lead to original value + 1 */
    {
      for ( w = v - a*a, b = - ~Math.sqrt(w); b--;)
      {
        for ( x = w - b*b, c = - ~Math.sqrt(x); c--;)
        {
          (d = Math.sqrt(x-c*c)) == ~~d && prompt([m*a, m*b, m*c, m*d], a=b=c='') /* 0s a,b,c to exit loop */
        }
      }
    }
    

    Example output

    123456789
    11111,48,10,8
    
    805306368
    16384,16384,16384,0
    

    edc65

    Posted 2014-05-17T23:58:15.577

    Reputation: 31 086