Starry Metagolf

25

1

Starry is a funny esoteric programming language in which code consists only of +*.,`' where the actual command represented by each of those characters is determined by the number of spaces in front of it. That makes it tricky even to golf fixed-output challenges, because different commands can account for vastly different numbers of bytes. In particular, number literals have a unary representation which makes it necessary to build up larger numbers by operating on smaller ones.

Therefore, this challenge is about writing a program which can golf such Starry programs.

How does Starry work?

(A few details are left unspecified on esolangs, so I'm going with the behaviour of the Ruby interpreter.)

Starry is a stack-based language, with a single stack of arbitrary-precision integer values (which is initially empty).

The only meaningful characters are:

+*.,`'

and spaces. All other characters are ignored. Each sequence of spaces followed by one of those non-space characters represents a single instruction. The type of instruction depends on the non-space character and the number of spaces.

The instructions are:

Spaces  Symbol  Meaning
0         +     Invalid opcode.
1         +     Duplicate top of stack.
2         +     Swap top 2 stack elements.
3         +     Rotate top 3 stack elements. That is, send the top stack element
                two positions down. [... 1 2 3] becomes [... 3 1 2].
4         +     Pop and discard top of stack.
n ≥ 5     +     Push n − 5 to stack.
0 mod 5   *     Pop y, pop x, push x + y.
1 mod 5   *     Pop y, pop x, push x − y.
2 mod 5   *     Pop y, pop x, push x * y.
3 mod 5   *     Pop y, pop x, push x / y, rounded towards -∞.
4 mod 5   *     Pop y, pop x, push x % y. The sign of the result matches the sign of y.
0 mod 2   .     Pop a value and print it as a decimal number.
1 mod 2   .     Pop a value and print it as an ASCII character. This throws an error
                if the value is not in the range [0, 255].
n         `     Mark label n.
n         '     Pop a value; if non-zero, jump to label n. 

Note that the interpreter scans the source code for the labels before execution begins, so it's possible to jump forwards as well as backwards.

Of course, Starry also has input commands (using , analogously to .), but those are irrelevant for this challenge.

The Challenge

Given a string, generate a Starry program which takes no input and prints that string exactly to STDOUT.

You may write a program or function, taking input via STDIN (or closest alternative), command-line argument or function argument and outputting the result via STDOUT (or closest alternative), function return value or function (out) parameter.

You may assume that the string is not longer than 128 characters and that it will consist only of printable ASCII characters (code points 0x20 to 0x7E).

Your solution should process any such input in less than 5 minutes on a reasonable desktop machine (there's some leeway to this; if it takes a few minutes more on my laptop I don't mind, but if it takes 15, I'll disqualify it).

Your solution will be tested on a number of different strings listed below. Your score is the total byte count of the corresponding Starry programs. In the case of a tie, the shortest metagolfer wins. That is, don't bother golfing your own code unless there is a tie (which I think will only happen in the case that an optimal solution is possible).

You must not optimise your code towards the specific test cases listed below. Specifically, you shouldn't hardcode hand-crafted solutions for them. Optimising towards classes of strings whose structure is similar to that of the given strings is fine. If I suspect anyone of hardcoding solutions, I reserve the right to replace some or all of the test cases (with strings of comparable structures).

Test Cases

Each line is a separate test case:

Hello, World!
pneumonoultramicroscopicsilicovolcanoconiosis
.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.
Hickory, dickory, dock. The mouse ran up the clock. The clock struck 1. The mouse ran down. Hickory, dickory, dock.
36912059868043514648560046917066768694455682545071266675083273015450033938555319356951628735735013250100789433961153496780296165
bVZ48121347GLtpYnt76CZSxTpMDs6791EJE808077eySXldY162424ddTB90707UupwlWGb63618542VhA252989453TXrWgqGm85899uHOAY2oAKE198GOVUttvW63
7MYxoWBNt180CDHS5xBGvU70HHVB17bh8jYzIIiU6n6g98Rose1nOe8Svcg56nax20q30kT3Ttb2jHl5q2Iuf1vPbjPxm9cyKXwxc0OUK8pr13b2n7U9Y7RwQTc26A1I
n9}unwxVa}[rj+5em6K#-H@= p^X/:DS]b*Jv/_x4.a5vT/So2R`yKy=in7-15B=g _BD`Bw=Z`Br;UwwF[{q]cS|&i;Gn4)q=`!G]8"eFP`Mn:zt-#mfCV2AL2^fL"A

Credits for the second test case go to Dennis. Credits for the fourth test case go to Sp3000.

Reference Solution

Here is a really basic reference solution in CJam:

q{S5*\iS*'+S'.}%

You can run it against the entire test suite here. The scores are:

1233
5240
4223
11110
7735
10497
11524
11392
Total: 62954

It's the simplest possible approach: push each character's code point as a literal, and then print it. It makes no use of small differences between consecutive characters, integer printing, repetitive parts of the string etc. I'll leave those things to you.

I believe there is a lot of room for improvement. For reference, the shortest handcrafted "Hello, World!" is only 169 bytes long.

Martin Ender

Posted 2015-11-16T15:35:27.153

Reputation: 184 808

Answers

6

Ruby, 13461 10997

$s = {};
def shortest a,b=nil
    return $s[[a,b]] if $s[[a,b]]
    l = []
    if b
        if a == b
            return $s[[a,b]] = ""
        elsif a > b
            l.push shortest(a-b)+" *"
            l.push " +   *"+shortest(1,b) if a > 1
            l.push " + *"+shortest(0,b) if a > 0
            l.push "    +"+shortest(b)
        elsif a < b
            l.push " +  *"+shortest(a*a,b) if a*a>a && a*a<=b
            l.push " +*"+shortest(a+a,b) if a+a<=b && a+a>a
            l.push shortest(b-a)+"*"
            l.push " +"+shortest(a,b/a)+"  *" if a>2 && b%a == 0
            l.push " +"+shortest(a,b-a)+"*" if a>1 && b>a*2
        end
    else
        l.push ' '*(a+5)+'+' #if a < 6
        (1..a/2).each {|n|
            l.push shortest(n)+shortest(n,a)
        }
    end
    return $s[[a,b]] = l.min_by{|x|x.length}
end

def starry(str)
    arr = str.bytes.map{|b|
        if b>47 && b<58
            b-48# change digets to numbers
        else
            b
        end
    }

    startNum = (1..128).min_by{|x|arr.inject{|s,y|s + [shortest(x,y).length+2,shortest(y).length].min}+shortest(x).length}
    #one number to be put on the stack at the start.

    code = shortest(startNum)
    code += [
        shortest(arr[0]),
        " +"+shortest(startNum, arr[0])
    ].min_by{|x|x.length}

    arr.each_cons(2) do |a|
        pr = a[0]<10?'.':' .'
        code += [
            pr+shortest(a[1]),
            " +"+pr+shortest(a[0], a[1]),
            pr+" +"+shortest(startNum, a[1])
        ].min_by{|x|x.length}
    end
    code += arr[-1]<10?'.':' .'
end

a = ["Hello, World!",
"pneumonoultramicroscopicsilicovolcanoconiosis",
".oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.",
"Hickory, dickory, dock. The mouse ran up the clock. The clock struck 1. The mouse ran down. Hickory, dickory, dock.",
"36912059868043514648560046917066768694455682545071266675083273015450033938555319356951628735735013250100789433961153496780296165",
"bVZ48121347GLtpYnt76CZSxTpMDs6791EJE808077eySXldY162424ddTB90707UupwlWGb63618542VhA252989453TXrWgqGm85899uHOAY2oAKE198GOVUttvW63",
"7MYxoWBNt180CDHS5xBGvU70HHVB17bh8jYzIIiU6n6g98Rose1nOe8Svcg56nax20q30kT3Ttb2jHl5q2Iuf1vPbjPxm9cyKXwxc0OUK8pr13b2n7U9Y7RwQTc26A1I",
"n9}unwxVa}[rj+5em6K#-H@= p^X/:DS]b*Jv/_x4.a5vT/So2R`yKy=in7-15B=g _BD`Bw=Z`Br;UwwF[{q]cS|&i;Gn4)q=`!G]8\"eFP`Mn:zt-#mfCV2AL2^fL\"A"]
c = a.map{
    |s|
    starry(s).length
}
p c.inject(0){|a,b|a+b}

The method starry answers the given question.

Results:

230
639
682
1974
1024
1897
2115
2436
Total: 10997

How it works

shortest is the main algorithm. It takes one number and finds the shortest way to place it on the stack, or it takes two numbers, and returns code to put the second on the stack assuming the first one is already on. $s is a Hash to hold the results of these operations for further use.

starry takes a string and splits it into an array of character codes (or numbers for digests). It starts the code with one number on the bottom of the stack. Next it calculates the shortest way it can generate each successive number, possibly copying the last one or using the number put on the stack at the beginning.

MegaTom

Posted 2015-11-16T15:35:27.153

Reputation: 3 787

9

Python 3, 17071 11845

from functools import lru_cache
import heapq
import time

cases = r"""
Hello, World!
pneumonoultramicroscopicsilicovolcanoconiosis
.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.oOo.
Hickory, dickory, dock. The mouse ran up the clock. The clock struck 1. The mouse ran down. Hickory, dickory, dock.
36912059868043514648560046917066768694455682545071266675083273015450033938555319356951628735735013250100789433961153496780296165
bVZ48121347GLtpYnt76CZSxTpMDs6791EJE808077eySXldY162424ddTB90707UupwlWGb63618542VhA252989453TXrWgqGm85899uHOAY2oAKE198GOVUttvW63
7MYxoWBNt180CDHS5xBGvU70HHVB17bh8jYzIIiU6n6g98Rose1nOe8Svcg56nax20q30kT3Ttb2jHl5q2Iuf1vPbjPxm9cyKXwxc0OUK8pr13b2n7U9Y7RwQTc26A1I
n9}unwxVa}[rj+5em6K#-H@= p^X/:DS]b*Jv/_x4.a5vT/So2R`yKy=in7-15B=g _BD`Bw=Z`Br;UwwF[{q]cS|&i;Gn4)q=`!G]8"eFP`Mn:zt-#mfCV2AL2^fL"A
""".strip().splitlines()

@lru_cache(maxsize=128)
def shortest_m_to_n(m, n):
    if m is None:
        L = []
    else:
        L = [m]

    to_search = [[0, "", L]]
    seen = set()

    while True:
        length, code, stack = heapq.heappop(to_search)

        if len(stack) == 1 and stack[-1] == n:
            return code

        seen.add(tuple(stack))
        options = []

        for i in range(1, 11):
            new_stack = stack[:] + [i]
            new_code = code + ' '*(i+5) + '+'
            options.append([len(new_code), new_code, new_stack])

        if stack:
            new_stack = stack[:] + [stack[-1]]
            new_code = code + " +"
            options.append([len(new_code), new_code, new_stack])

        if len(stack) >= 2:
            x, y = stack[-2:]

            for i, op in enumerate(['+', '-', '*', '//', '%']):
                try:
                    new_elem = eval("{}{}{}".format(x, op, y))
                    new_stack = stack[:-2] + [new_elem]
                    new_code = code + ' '*i + '*'
                    options.append([len(new_code), new_code, new_stack])

                except ZeroDivisionError:
                    pass

        for op in options:
            if tuple(op[2]) in seen or len(op[2]) > 4 or op[2][-1] > 200:
                continue

            heapq.heappush(to_search, op)

def lcs(s1, s2):
    dp_row = [""]*(len(s2)+1)

    for i, c1 in enumerate(s1):
        new_dp_row = [""]

        for j, c2 in enumerate(s2):
            if c1 == c2 and not c1.isdigit():
                new_dp_row.append(dp_row[j] + c1)
            else:
                new_dp_row.append(max(dp_row[j+1], new_dp_row[-1], key=len))

        dp_row = new_dp_row

    return dp_row[-1]

def metagolf(s):
    keep = ""
    split_index = 0

    for i in range(1, len(s)):
        l = lcs(s[:i], s[i:][::-1])
        if len(l) > len(keep):
            keep = l
            split_index = i

    code = []
    stack = []
    keep_ptr = 0
    i = 0

    while i < len(s):
        c = s[i]
        n = ord(c)

        if c in "0123456789":
            code += [" "*(int(c)+5) + "+."]
            i += 1
            continue

        if stack:
            if stack[-1] == n:
                code += [" +", " ."]
            elif len(stack) >= 2 and stack[-2] == n:
                for j in range(len(code)):
                    if code[~j] == " +":
                        code[~j] = ""
                        break

                code += [" +", " ."]
                stack.pop()
            else:
                code += [shortest_m_to_n(stack[-1], n), " +", " ."]
                stack[-1] = n

        else:
            code += [shortest_m_to_n(None, n), " +", " ."]
            stack.append(n)

        while i < split_index and keep[keep_ptr:][:1] == c:
            code += [" +"]
            keep_ptr += 1
            stack.append(n)

        i += 1

    code = "".join(code)

    if code[-4:] == " + .":
        code = code[:-4] + " ."

    return code

total = 0

for case in cases:
    start_time = time.time()

    s = metagolf(case)
    print(len(s), time.time() - start_time)
    total += len(s)
    print(s)
    print('='*50)

print(total)

The relevant function is the aptly named metagolf.

The results are:

210
676
684
2007
1463
2071
2204
2530
Total: 11845

You can find the full output here.

Brief explanation

I'm going to keep the explanation brief since there's many things still to be improved.

The basic algorithm just looks at pairs of chars, and finds the optimal way to transition from one char to another via BFS. Digits are currently pushed and printed immediately, although this will be changed later.

There's also a little longest-common-subsequence going on, to leave a few elements on the stack for reuse later. It's not as good as repetition, but provides decent savings.

Sp3000

Posted 2015-11-16T15:35:27.153

Reputation: 58 729

Hooray, someone to battle :-) Of course, now I see that mine has a long way to go... – ETHproductions – 2015-11-17T01:26:21.833

5

JavaScript, 25158 23778

Now ES5-compatible!

String.prototype.repeat = String.prototype.repeat || function (n) { return Array(n+1).join(this); }

function starrify(x) {
  function c(x){return x.charCodeAt()}
  var char = x[0], result = ' '.repeat(c(char)+5)+'+ + .';
  x=x.slice(1);
  for(var i in x) {
    if (char < x[i]) {
      result += ' '.repeat(c(x[i])-c(char)+5)+'+* + .';
    } else if (char > x[i]) {
      if(c(char)-c(x[i]) < c(x[i])) {
        result += ' '.repeat(c(char)-c(x[i])+5)+'+ * + .';
      } else {
        result += ' '.repeat(c(x[i])+5)+'+ + .';
      }
    } else {
      result += ' + .';
    }
    char = x[i];
  }
  return result;
}

Results:

432
949
2465
3996
1805
3551
5205
5375
Total: 23778

A good start in my opinion, but obviously not finished. Instead of creating each char separately, it adds or subtracts from the previous char code. I'll add an full explanation when I'm done meta-golfing.

ETHproductions

Posted 2015-11-16T15:35:27.153

Reputation: 47 880

Yep, it works in Firefox now, although Chrome still complains about charCodeAt. – Martin Ender – 2015-11-17T07:27:48.383