What is the minimum set of characters with which you can write any valid python program?

12

3

The proof of the set of characters is that there is a translator which takes any ascii python program as input and produces a limited character set version of the program. The limited character set program can be slower than the original program. It simply has to be possible to produce the same output as the valid python program.

For example, here is a translator that takes any ascii python program as input and writes out a version that doesn't use the character x:

import random
import string
with open('in.py') as infile:
    with open('out.py', 'w') as outfile:
        lines = infile.readlines()
        outfile.write("data = '")
        random_string = ''.join(random.choice(string.digits) for x in range(20))
        for line in lines:
            outfile.write(line.encode('string-escape').replace('x', random_string))
        outfile.write("'\n")
        outfile.write("exec data.replace('%s', chr(%s))" % (random_string, ord('x')))

note that this will fail if the program happens to have the random string in it already, but for the purposes of this question lets accept that 'any' can be slightly limited. We don't have to consider extremely low probabilities of failure due to random chance or a malicious input program.

vishvananda

Posted 2012-08-11T11:30:28.410

Reputation:

Question was closed 2017-02-20T16:19:10.210

2The question is? No idea about the sense of this "questions" – None – 2012-08-11T11:44:15.077

good luck writing a useful program without sorted(set('import for while = in : if def [] return class () break .')) On the other hand, you can write http://en.wikipedia.org/wiki/Whitespace_(programming_language) programs with only three characters.

– None – 2012-08-11T12:04:38.173

@msw: To be fair, that's not true. All for loops can be replaced with while loops (though admittedly that doesn't get you anywhere if you leave in def, import, and return), all list literals could turn from [...] to list((...)), you can shuffle a program to avoid any break, and there are probably similar transformations. – David Robinson – 2012-08-11T12:56:17.633

1

@vishvanda: While this question has problems, it did inspire me to write such a translator, which removes all characters except for "()+,.1[]cehijnorx. My solution is here.

– David Robinson – 2012-08-11T13:18:50.843

2

@msw: As you can see in my gist, it's actually possible to write any Python program with just the characters "()+,.1[]cehijnorx).

– David Robinson – 2012-08-11T13:20:23.120

Interesting, amusing, and likely a fun puzzle. Cheers. – None – 2012-08-11T14:20:54.787

I thought it was pretty interesting, too bad it was closed as. – None – 2012-08-11T16:16:10.207

1By the way, I've improved my solution to use only 9 characters: ()+1cehrx. – David Robinson – 2012-08-11T19:29:44.367

But does it support unicode ;) Good explanation regarding codegolf, too. – None – 2012-08-11T20:55:37.157

Answers

17

I've come up with a 9-character translator using exec (also posted at this GitHub gist):

def convert_txt(txt):
    """Convert the text of a Python program to minimal Python"""
    s = "+".join(["chr(%s)" % "+".join(["1"] * ord(c))
                            for c in txt])
    return 'exec(%s)' % s

The only characters it needs are ()+1cehrx.

David Robinson

Posted 2012-08-11T11:30:28.410

Reputation: 271

LOOOOOOOOOOOOOL +1 – Ev_genus – 2012-08-14T09:34:45.553

Does Python permit literal chr(0) characters in strings? – Peter Taylor – 2012-08-14T19:17:03.740

That's not how it's used here. This creates a program whose string is like exec(chr(1+1+1...)+chr(1+1...)). The new program thus uses only 9 characters, reconstructs the string, and uses exec – David Robinson – 2012-08-14T19:45:52.127

Precisely. So if I pass it a Python program which contains a literal chr(0) it will break. The most obvious place in which a literal chr(0) would be valid is inside a string. – Peter Taylor – 2012-08-15T09:00:13.013

Sorry, I misunderstood your original question. Anyway, I've been able to use it: for example, chr(100) + chr(0)+chr(65) yields 'd\x00A'. Am I missing something? – David Robinson – 2012-08-15T13:02:20.557

Oh! I misunderstood you yet again. Yes, it would lead to an error, as would a Unicode character. – David Robinson – 2012-08-15T16:01:43.167

This has been beaten! 8 chars – mbomb007 – 2016-09-26T16:33:53.803

Now in 7 chars – mbomb007 – 2017-02-21T22:14:04.567

You can support unicode and null characters if desired without extra characters by doing two translation steps:

the first to exec(unichr(0+1+1+1+...)+...) to remove all unicode and null characters and the second as above to remove 'uni0' – paradigmsort – 2014-10-16T18:48:16.283

@paradigmsort Nice! Of course, that would cause the length of the code to expand by another ~2 orders of magnitude. Now that would just make it impractical... – David Robinson – 2014-10-16T18:53:21.860