Universal (rule-bending) Code Golf solver

Code golf always involves some answers that bend the rules more or less by breaking constraints that the challengers took for granted or just haven't thought about and didn't list in the rules. One of these interesting loopholes is the possibility to output more than the challenge asks for to get a better result.

Taking this to extremes, we can write an universal code golf solver that prints the desired output — if you don't care that it might take ages and outputs lots of other stuff before and after it.

All we need to output is a sequence that is guaranteed to contain every possible subsequence. For this code golf, this will be the Ehrenfeucht-Mycielski sequence:

The sequence starts with the three bits 010; each successive digit is formed by finding the longest suffix of the sequence that also appears earlier within the sequence, and complementing the bit following the most recent earlier appearance of that suffix.

Every finite subsequence of bits occurs contiguously, infinitely often within the sequence

The first few digits of the sequence are:

010011010111000100001111... (sequence A038219 in OEIS).

Combining 8 bits of the sequence to a byte, we'll get ASCII output that we can output to the screen or to a file and that contains every possible finite output. The program will output parts of pi, the lyrics of “Never gonna give you up”, some nice ASCII art, its own source code, and everything else you could want it to output.

For testing correctness, here are hashes for the first 256 bytes of the sequence:

MD5: 5dc589a06e5ca0cd9280a364a456d7a4
SHA-1: 657722ceef206ad22881ceba370d32c0960e267f

The first 8 bytes of the sequence in hexadecimal notation are:

4D 71 0F 65 27 46 0B 7C

Rules:

Your program must output the Ehrenfeucht-Mycielski sequence (nothing else), combining 8 bits to a byte/ASCII character.
Shortest program (character count) wins. Subtract 512 from your character count if you manage to generate the sequence in linear time per generated byte.

schnaader

Posted 2012-06-09T00:10:55.743

Reputation: 1 132

The longest suffix in 010 which appeared earlier in that sequence is 0, isn't it? And the most recent earlier appearance is just the second 0. And until now, nothing follows the second 0, so there is nothing we can build a complement on. I'm not a native english speaker - maybe I just got it wrong. The wikipedia article uses the same words, but has a longer sequence so I would name it "the most recent ... which has a follower". – user unknown – 2012-06-09T02:06:37.563

8Pedantic quibble: pi will never appear - only every finite string will be contained in the output. – Keith Randall – 2012-06-09T02:55:23.430

I have another question: May a repetition overlap? For example in 111, (1[1)1]? – user unknown – 2012-06-09T03:01:25.593

@KeithRandall: I would prefer a sequence which is guaranteed not to contain 'Never gonna give you up' and productions of similar kind. – user unknown – 2012-06-09T03:04:34.310

@userunknown I think they're just relying on the idea that the most recent suffix is not "earlier" in the string, being at the end, so it is automatically not considered. – breadbox – 2012-06-09T05:33:26.417

@breadbox: Maybe you can clarify, why the first triple-one: 111 is followed by a zero? Since it is the first 111, we have to search for a shorter sequence - maybe 11? There is the overlapping 11 which is followed by a 1, so the next character has to be a 0. The 11 to the left is followed by 0 so this would lead to another 1. So we have to accept overlaps? – user unknown – 2012-06-09T06:06:06.330

@userunknown Or look at it this way: if you don't defined "earlier" to exclude the end of the string, then it the longest match will necessarily always be at the end. Which is obviously uninteresting, and therefore excluded. Think like a mathematician instead of a programmer and it makes perfect sense! – breadbox – 2012-06-09T06:16:01.293

@KeithRandall: Of course. Corrected now. – schnaader – 2012-06-09T09:50:36.140

@userunknown: It says "complementing the bit following the most recent earlier appearance of that suffix". "Most recent" will lead you to the 11 "most right", but "earlier appearance" excludes overlappings with the suffix (010011010111), so the 11 you choose is the other one: 010011010111 – schnaader – 2012-06-09T10:00:15.430

@schnaader: But then a 0 follows, you have to complement it, which leads to 1, and you would end with a series of 4 times 1 - but there is only 111. – user unknown – 2012-06-09T11:26:45.620

@userunknown: Sorry, messed it up. Earlier appearance includes overlappings. Every position is fine as long as it's not the suffix itself which isn't followed by a bit. We choose the rightmost match position. So it is the other way round, we are choosing 010011010111 indeed, so a 0 is added. – schnaader – 2012-06-09T11:34:35.530

@schnaader: But you have to test out the example that far too, to find that out; don't you? It's not clear from the description of the rules; nor at wikipedia, and more so not here. – user unknown – 2012-06-09T12:45:43.447

It might be worth mentioning that the mere presence of an "answer" embedded at an unspecified location in an infinite string cannot be regarded as "outputting" that answer, of course. Also, this particular sequence is just one example of a disjunctive sequence -- there are infinitely many sequences like this.

– r.e.s. – 2012-06-09T14:16:44.640

Answers

C, –110 chars

This version of the program uses the linear-runtime algorithm to generate the sequence. Subtracting 512 from the 402 chars in the program gives a total of negative 110.

#define C v=calloc(7,8),v->p=p
#define G(F,K)u->F[d[K]]
#define S(F,T)G(f,T)=F,G(t,T)=T,G(n,T)=
struct{int p,f[2],t[2];void*n[2];}r,*u,*v,*w;char*d,c;p,b,h,i,j,k;
main(s){for(;d=++p-s?d:realloc(d,s*=2);){d[i=p]=b;c+=c+b;p%8||putchar(c);
for(u=&r;b=u->p,u->p=p,w=G(n,k=i);S(i,k)v=G(n,k),u=v)for(h=G(f,k),j=G(t,k);j>h;--i,--j)
if(d[i]-d[j]){S(i,k)C;u=v;S(h,j)w;S(0,i)C;b=w->p;goto x;}S(0,i)C;x:b=1-d[b+1];}}

As per the problem, the program runs in an infinite loop, which necessitates lots of memory allocation, and using realloc() to keep the sequence contiguous can contribute to heap fragmentation. You can improve the program's memory usage by replacing calloc(7,8) on the first line with calloc(1,sizeof*v). This will help especially on a 32-bit machine, where 56 is likely too large by a factor of two.

The code is kind of unreadable, and not in an interesting way; for that I apologize. Frankly, even the ungolfed version isn't terribly clear:

#include <stdio.h>
#include <stdlib.h>

typedef struct branch branch;
typedef struct node node;

struct branch {
    int from, to;
    node *next;
};

struct node {
    int pos;
    branch br[2];
};

static node root = { 0 };

static unsigned char *data = NULL;
static int endpos = 0;
static int size = 1;

static node *mknode(void)
{
    node *n;

    n = calloc(1, sizeof *n);
    n->pos = endpos;
    return n;
}

static branch *getbranch(node *n, int p)
{
    return &n->br[data[p]];
}

static void setbranch(node *n, int from, int to, node *next)
{
    n->br[data[to]].next = next;
    n->br[data[to]].from = from;
    n->br[data[to]].to = to;
}

int main(void)
{
    node *u, *v, *w;
    int follower, from, i, i0, j;
    int out, b;

    out = b = 0;
    for (;;) {
        ++endpos;
        if (endpos == size) {
            size *= 2;
            data = realloc(data, size);
        }
        data[endpos] = b;
        out = (out << 1) | b;
        if (endpos % 8 == 0) {
            putchar(out);
            out = 0;
        }

        i = endpos;
        u = &root;
        for (;;) {
            follower = u->pos + 1;
            u->pos = endpos;
            w = getbranch(u, i)->next;
            if (!w)
                break;
            i0 = i;
            from = getbranch(u, i0)->from;
            for (j = getbranch(u, i0)->to ; j > from ; --j) {
                if (data[i] != data[j]) {
                    /* divide branch */
                    v = mknode();
                    setbranch(u, i, i0, v);
                    u = v;
                    setbranch(u, from, j, w);
                    setbranch(u, 0, i, mknode());
                    follower = w->pos + 1;
                    goto bitfound;
                }
                --i;
            }
            v = getbranch(u, i0)->next;
            setbranch(u, i, i0, v);
            u = v;
        }
        /* extend branch */
        setbranch(u, 0, i, mknode());

      bitfound:
        b = 1 - data[follower];
    }
}

(The ungolfed code above is based on the code written by Grzegorz Herman and Michael Soltys, as referenced in the problem description, and from Soltys' home page.)

Thanks to @schnaader and @r.e.s. for reporting a bug in the initial version.

breadbox

Posted 2012-06-09T00:10:55.743

Reputation: 6 893

Nice! That's what I hoped for with the -512 bonus. – schnaader – 2012-06-13T20:59:51.537

Any idea why this causes crashes on by system? All of the golfed, ungolfed and malloc modified versions stop output after about 10000 bytes and keep on allocating memory, prog > out.dat gives an instant crash with only ~700 KB memory usage. If I insert printf("\n%i\n", size); after realloc, the biggest output is 4. System: Windows 7 Prof. 64-Bit, 4 GB RAM, GCC 4.6.1 – schnaader – 2012-06-13T21:02:41.237

(+1) I find that with Ubuntu12.04/gcc, both of your programs compile and produce the correct output ... With Win7/mingw/gcc, both programs compile but produce segmentation faults ... With Win7/lcc, the ungolfed version works, but the golfed version produces segmentation faults. – r.e.s. – 2012-06-13T23:09:25.180

1That sounds like use of uninitialized data to me. Sure enough -- I don't have access to a Windows machine, but valgrind shows the problem. Looks like I reproduced this bug from the original reference implementation, too. Fortunately it's an easy fix; thanks for reporting it! – breadbox – 2012-06-14T04:47:23.757

Great, works like a charm now. – schnaader – 2012-06-14T08:59:20.513

Ruby, 109 104 101 94 characters

s=?0
loop{s=(s[/(.*).*\1/][/.#{$1}/]<?1??1:?0)+s
s.size&7<1&&$><<[s.reverse.to_i(2)].pack(?C)}

Implementation in Ruby using regular expressions for suffix search. Since it takes quite a long time until out-of-memory the program has to be terminated by the user.

Edit: I just noticed that it is enough to start with the sequence 0.

Edit 2: The proposal of r.e.s. saves 2 characters, some others because we do not have to cut out a single byte before pack.

Howard

Posted 2012-06-09T00:10:55.743

Reputation: 23 109

Using s=(s[/(.*).*\1/][/.#{$1}/]<?1??1:?0)+s will save another two characters. – r.e.s. – 2012-06-10T23:51:47.117

@r.e.s. This works indeed. Thank you. – Howard – 2012-06-11T05:10:11.730

Can you get rid of the parentheses around ?C? – Fund Monica's Lawsuit – 2016-03-11T03:04:41.087

Perl, 95 chars

I actually had a halfway decent version of this at first. Then as I golfed, each version got slower. Increasingly slower.

$|=$_="010";
y///c%8||print pack"B*",/(.{8})$/while/(.+)$(?(?{m|.*$^N(.)|})(?{$_.=1-$^N})|(?!))/

The first three characters ($|=) aren't necessary, strictly speaking ... but without that there, you'd typically have to wait for the script to finish generating a full 4096 bytes before you'd see any of the output. And that would take hours. Maybe centuries; I'm not sure. Did I mention that this program's performance kind of deteriorates over time? So because of that I kind of felt compelled to include them in the count.

On the other hand, this script has one of the ugliest regexes I've ever created, so I think I'm proud of it.

breadbox

Posted 2012-06-09T00:10:55.743

Reputation: 6 893

1Don't worry about the performance, the algorithm is O(N^3) without optimizations. My simple Delphi program I wrote took about 30 seconds for 256 bytes,but about an hour for 1024 bytes, so I'd assume 4096 bytes to take one or several days. Of course, RegEx and space optimizations have the potential to make it worse :) – schnaader – 2012-06-09T13:47:47.643

My initial Perl script took 10 seconds for 256 bytes. This version takes 90 seconds. (Nor does it appear to be a linear slowdown.) – breadbox – 2012-06-09T18:26:30.420