You are sick of all of the codegolf challenges. Hence you decide to write a program that will automatically golf some Python code for you. There are 3 test cases:

print quickSort([0,7,3,-1,8,10,57,2])
def quickSort(arr):
    less = []
    pivotList = []
    more = []
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        for i in arr:
            if i < pivot:
                less.append(i)
            elif i > pivot:
                more.append(i)
            else:
                pivotList.append(i)
        less = quickSort(less)
        more = quickSort(more)
        return less + pivotList + more

for i in xrange(1, 101):
    if i % 15 == 0:
        print "FizzBuzz"
    elif i % 3 == 0:
        print "Fizz"
    elif i % 5 == 0:
        print "Buzz"
    else:
        print i

from sys import argv

def randomGenerator(seed=1):
    max_int32 = (1 << 31) - 1
    seed = seed & max_int32

    while True:
        seed = (seed * 214013 + 2531011) & max_int32
        yield seed >> 16

def deal(seed):
    nc = 52
    cards = range(nc - 1, -1, -1)
    rnd = randomGenerator(seed)
    for i, r in zip(range(nc), rnd):
        j = (nc - 1) - r % (nc - i)
        cards[i], cards[j] = cards[j], cards[i]
    return cards

def show(cards):
    l = ["A23456789TJQK"[c / 4] + "CDHS"[c % 4] for c in cards]
    for i in range(0, len(cards), 8):
        print " ", " ".join(l[i : i+8])

if __name__ == '__main__':
    seed = int(argv[1]) if len(argv) == 2 else 11982
    print "Hand", seed
    deck = deal(seed)
    show(deck)

Rules:

Your program must not target the code I posted specifically, and should work with any Python 2 code. I reserve the right to change the source code being codegolfed. You may assume that there are no multi-line strings (so you don't have build a full-blown parser), and that locals() is not called.
The output of your program should run in an identical manner as the original source code. (Namely, it must produce the same output. Variable names and language constructs can be changed, as long as the output remains the same)
You may use STDIO or a File to do your input/output of the source code.

Your score will be the sum of the bytes of your program's output.

(The code listed above has been taken from http://rosettacode.org/ under the GNU Free Documentation License 1.2)

Nathan Merrill

Posted 2015-01-07T16:17:10.077

Reputation: 13 591

Do we need to handle invalid code? IE if we make invalid code valid, is the entry valid? – Justin – 2015-01-07T16:32:25.560

This seems very similar to http://codegolf.stackexchange.com/questions/3652/write-a-code-golfer/26075

– KSab – 2015-01-07T16:34:33.060

@Quincunx No, it should not deal with invalid code. However, it should be able to handle any python code (including list comprehension and comments, even though it isn't in the programs above) – Nathan Merrill – 2015-01-07T16:44:51.117

@KSab Yes, it is similar, but fundamentally different, as I have a scoring system. – Nathan Merrill – 2015-01-07T16:45:28.583

@KSab I think that the two questions are different in both scoring and in task to be accomplished. This question: Shortest output wins. Old question: Most popular code that does five tasks plus "etc" wins. Who the heck knows what "etc" means. – Rainbolt – 2015-01-07T16:46:10.850

Here's a bonus test case for people to try, to be devious.

– Sp3000 – 2015-01-07T16:51:59.537

4What's your model for determining whether the output "[runs] in an identical manner as the original source code"? E.g. for the second example, I believe that removing if __name__ == '__main__': would affect the behaviour in some contexts but not others. For another example, if the ungolfed input assumes that it reads an int from stdin and throws one type of exception if given something else, could the golfed input throw a different type of exception if given a non-integer? – Peter Taylor – 2015-01-07T17:00:36.930

Removing if __name__ == '__main__': will change the behaviour. Changing exception type will change the behaviour. In the same context, the program should produce identical output (including STDERR). – Nathan Merrill – 2015-01-07T17:16:49.717

Python 2, correct? – TheNumberOne – 2015-01-07T19:42:03.130

1I'm a python noob, but for me it complains that the quickSort call is before the definition. Works fine if I move the quickSort call to the end – Digital Trauma – 2015-01-07T19:43:54.320

@TheBestOne *"[...] and should work with any Python 2 code."* – Rainbolt – 2015-01-07T19:43:55.617

This challenge does not seem good. I'll bet any answers that are posted (that attempt any sort of reduction) will fail on some code other than the single test case. – feersum – 2015-01-07T20:17:33.177

@feersum I updated the problem description. There are no multi-line comments, meaning that most reductions will now work. – Nathan Merrill – 2015-01-07T20:40:28.097

Even single-line strings are fairly tricky. – feersum – 2015-01-07T20:44:39.763

2What about a program like this: random_long_variable=0;print locals()? – Justin – 2015-01-07T20:49:33.367

I didn't know about locals(). I'll exclude that as well, thanks. – Nathan Merrill – 2015-01-08T01:09:14.083

2Multiline strings aren't the only way to make empty lines functional – Sp3000 – 2015-01-08T01:15:01.517

Answers

Python 2.7, 794

I have been meaning to build a minifier for Python for a while, so this is a good opportunity to investigate the problem.

The program uses a mix of regular expression analysis, and Python parser operations. White space is minimised. Variable that are defined by the user are replaced by a single letter variable (which is not in use!). Finally the while True statement is put on a diet.

The three test cases all verify as running correctly. I could imagine some pathological examples which could result in errors in the generated code but the algorithm should be robust under most circumstances.

Results

228 t1.py
128 t2.py
438 t3.py
794 total

Output

def c(a):
 b=[]
 d=[]
 f=[]
 if len(a)<=1:
  return a
 else:
  e=a[0]
  for i in a:
   if i<e:
    b.append(i)
   elif i>e:
    f.append(i)
   else:
    d.append(i)
  b=c(b)
  f=c(f)
  return b+d+f
print c([0,7,3,-1,8,10,57,2])


for i in xrange(1,101):
 if i%15==0:
  print"FizzBuzz"
 elif i%3==0:
  print"Fizz"
 elif i%5==0:
  print"Buzz"
 else:
  print i


from sys import argv
def a(k=1):
 b=(1<<31)-1
 k=k&b
 while 1:
  k=(k*214013+2531011)&b
  yield k>>16
def d(k):
 f=52
 h=range(f-1,-1,-1)
 g=a(k)
 for i,r in zip(range(f),g):
  j=(f-1)-r%(f-i)
  h[i],h[j]=h[j],h[i]
 return h
def m(h):
 l=["A23456789TJQK"[c/4]+"CDHS"[c%4]for c in h]
 for i in range(0,len(h),8):
  print" "," ".join(l[i:i+8])
if __name__=='__main__':
 k=int(argv[1])if len(argv)==2 else 11982
 print"Hand",k
 e=d(k)
 m(e)

Code

import sys
import re
from tokenize import generate_tokens
from token import tok_name
from keyword import iskeyword

wr = sys.stdout.write

def pyparse(text):
    'Return [TYPE,TOKEN] pair list'
    # Use KEYWORD,NAME,NUMBER,OP,STRING,NL,NEWLINE,COMMENT,INDENT,DEDENT
    rawtokens = generate_tokens(text.readline)
    tokens = [[tok_name[n], t] for n,t,p1,p2,dx in rawtokens]
    for tpair in tokens:
        if tpair[0] == 'NAME' and iskeyword(tpair[1]):
            tpair[0] = 'KEYWORD'
    return tokens

def finduservars(filename):
    'Return a set of user variables that we can replace with a-zA-Z'
    varset = set()
    for line in open(filename):
        line = line.strip()
        match = re.match(r'def\s+(\w+)\s*\((.*)\)\s*:', line)
        if match:
            func, args = match.groups()
            varset.add(func)
            arglist = re.findall(r'(\w+|=)', args)
            for a in arglist:
                if a == '=':
                    break  # keyword args follow - too hard to parse
                varset.add(a)
            continue
        match = re.match(r'(\w+)\s*=.+', line)
        if match:
            assigned = match.group(1)
            varset.add(assigned)
            continue
    return set(v for v in list(varset) if len(v) > 1)

filename = sys.argv[1]
tokenlist = pyparse(open(filename))

# Build map for var->char conversion:
varset = finduservars(filename)
singles = [text for tok,text in tokenlist if tok=='NAME' and len(text)==1]
allvar = [chr(n) for n in range(97,123)+range(65,91)]
charvar = [c for c in allvar if c not in singles]
varreplaced = list(varset)[:len(charvar)]
varmap = dict((v, charvar.pop(0)) for v in varreplaced)

prev = 'NONE'
indent = ['']
output = []
add = output.append
for tok, text in tokenlist:
    if tok == 'NL':
        continue
    elif tok == 'INDENT':
        indent.append( text.replace('    ', ' ') )
        output[-1] = indent[-1]
    elif tok == 'DEDENT':
        indent.pop(-1)
        output[-1] = indent[-1]
    elif tok == 'NEWLINE':
        add(text)
        add(indent[-1])
    elif tok in 'KEYWORD,NAME,NUMBER':
        if prev in 'KEYWORD,NAME,NUMBER':
            add(' ')
        if tok == 'NAME':
            if output[-2] == 'while' and text == 'True':
                add('1') # common verbose idiom
            else:
                add(varmap.get(text, text))
        else:
            add(text)
    else:
        add(text)
    prev = tok

wr(''.join(output))

Logic Knight

Posted 2015-01-07T16:17:10.077

Reputation: 6 622

sed, 1074 (down from 1390)

Very mild, low-hanging-fruit type of answer, to get the ball rolling:

/^$/d                  # Remove empty lines
/^[ <--TAB-->]*#/d     # Remove whole-line comments
s/    /<--TAB-->/g     # Replace 4 spaces with tabs
/^[^'"]*$/s/ *([|&:,<>=*/%+-]) */\1/g  # Remove spaces before/after operators

_{Replace <--TAB--> with real TAB characters}

Obvious shortcoming:

Indents assumed to be exactly 4 spaces in input code.

Since we can assume no multi-line strings, then we only strip leading/trailing spaces from operators if there are no ' or " on the given line. This could be improved, but _{<mumbles something about sed regex always being greedy>}.

Test as follows:

$ cat qs.py fizzbuzz.py cards.py | wc -c
1390
$ sed -rf pygolf.sed qs.py fizzbuzz.py cards.py | wc -c
1074
$ sed -rf pygolf.sed qs.py fizzbuzz.py cards.py | python
[-1, 0, 2, 3, 7, 8, 10, 57]
1
2
Fizz
...
98
Fizz
Buzz
Hand 11982
  AH AS 4H AC 2D 6S TS JS
  3D 3H QS QC 8S 7H AD KS
  KD 6H 5S 4D 9H JH 9S 3C
  JC 5D 5C 8C 9D TD KH 7C
  6C 2C TH QH 6D TC 4S 7S
  JD 7D 8H 9C 2H QD 4C 5H
  KC 8D 2S 3S
$

Digital Trauma

Posted 2015-01-07T16:17:10.077

Reputation: 64 644

You don't need to check for multi-line strings, but your last two definitely need to be updated. – Nathan Merrill – 2015-01-07T20:41:33.000

@NathanMerrill yup. The operator leading/trailing space one is a bit better now, but the indent one will be much harder to generalize - and is where I get about 2/3 of the gain. – Digital Trauma – 2015-01-07T21:01:14.633

Python 3.4, 1134

This program should work alright for most programs. Strangely, Sp3000 test case is much easier to optimize for my program than your programs. Input is accepted through the file specified in the first argument. The actual file is modified.

import subprocess
from sys import argv

progamtext = open(argv[1]).read()

if 'argv' in progamtext or 'input' in progamtext or 'open' in programtext:#Make sure the program always produces the same results.
    exit(0)

program = subprocess.Popen(['C:\Python27\python', argv[1]], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
program.wait()
erroroutput1 = str(program.stderr.read())
output1 = str(program.stdout.read())
program = subprocess.Popen(['C:\Python27\python', argv[1]], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
program.wait()
erroroutput2 = str(program.stderr.read())
output2 = str(program.stdout.read())
if erroroutput1 != erroroutput2 or output1 != output2:#Make sure the program always produces the same results.
    exit(0)

newprog = ''
if erroroutput1:
    newprog += "import sys\n" + "sys.stderr.write("+ erroroutput1 + ')'
    if output1:
        newprog += "\n"
if output1:
    newprog += 'print ' + output1

if len(newprog) > len(progamtext):
    exit(0)

open(argv[1],mode='w').write(newprog)

How it works:

First, this program checks to see if your program interacts with the user at all or uses random. If it does, the program is unmodified. Next, the program is ran. The program is then replaced with print "output". Finally, if the program is shorter than its output, it is unmodified.

Sp3000's program, optimized:

import sys
sys.stderr.write(b'')
print b'0.540377721372\r\n3\r\n1\r\n7\r\n99\r\nf\r\n[5, 5]\r\n53\r\n53\r\n53\r\n'

Sp3000's super bonus program, optimized:

The optimized version is only off .001% of the time.

import sys
sys.stderr.write(b'')
print b'B\r\n'

TheNumberOne

Posted 2015-01-07T16:17:10.077

Reputation: 10 855

1I'm sure there are other external effects than argv, input and random, which your code would break. ;) – Martin Ender – 2015-01-08T00:41:16.680

2Hah. Maybe I should have put in some non-determinism - print id(0) is a good one. – Sp3000 – 2015-01-08T00:50:45.530

@Martin Fixed (mostly). :) – TheNumberOne – 2015-01-08T00:59:12.270

3Here's a super bonus test case, just for you. – Sp3000 – 2015-01-08T01:02:54.610

Heh, very creative. – Nathan Merrill – 2015-01-08T01:06:41.190

Maybe if the program is shorter than its output, you should work on optimizing whitespace, replacing vars with single-char names, etc. – mbomb007 – 2015-01-08T20:45:09.690

Auto-meta-code-golf

Answers

Python 2.7, 794

Results

Output

Code

sed, 1074 (down from 1390)

Test as follows:

Python 3.4, 1134

How it works: