Let's write a Minifier

14

2

Background

Minifiers are used, commonly, when serving JavaScript to your Web Browser. It is commonly used to reduce the number of bytes that have to be sent. Saving bandwidth is useful for obvious reasons. Some people use obfuscaters (that intentionally make code harder to read), I am not talking about those.

We will be minifying Python 2

I was debating on whether or not use JavaScript or Python for the minifying experience and I decided on Python for two reasons: white space matters and I think that will add an interesting dynamic the problem. Additionally, using Python 2.7 will provide another dynamic, such as remove superfluous () during a print (i.e. print("Hello world") vs. print"Hello world"). I would have, personally, preferred to open it up to any language, but for some languages this process won't make a lot of sense for. And, what language you decide to minify will directly impact your score (and if the language even can be minified).

Specs

Your goal is to only modify the code in a way that will not change it's functionality in anyway. You may, of course, change variable names (inside your minifying program) as long as it doesn't effect output (keep track of scope). Although I am giving you a specific program, please don't optimize for the test case as all standard loopholes are forbidden.

Score: length of the program after you minified it.

Input: Any Python 2.7 program (that contains no errors)

Output: A minified version.

Although your code should be able to accomodate all valid Python 2.7 input, it is necessary to test your script against something in order to prove it's effectiveness.

Click here to view the example program.

Making the problem more approachable

Feel free to use or modify any code found inside my solution (listed bellow). I did this to get you started with quote basic quote handling; however, you can expand it to indentation, and etc.

Example ways to minify Python

All white space could be replaced with the minimum possible amount (I acknowledge that in Python you can do some tricky stuff with tabs, but I'll leave that up to you to decide whether or not to implement it).

Example

The following:

def print_a_range(a):
    for i in range(a):
        print(i)

Could be:

def print_a_range(a):
 for i in range(a):
  print(i)

Technically, if there is only one line inside a loop, you can compress it even more:

def print_a_range(a):
 for i in range(a):print(i)  #Note, you can also remove the `()` here.

However, there is another way you can minify white space in Python:

The following:

print ([a * 2 for a in range(20) if a % 2 == 0])

Could be:

print([a*2for a in range(20)if a%2==0])

Note, that there is no need for a space between 2 and for. Variable, functions, and keywords cannot start with a number. So, the Python interpreter is okay with <num><keyword>, no space. You should also note there does not have to a space between ) and if.

Note, you must not change the output of the program! So:

print"f(x)=x*2 is a great equation!"

The above print statement should stay the same because removing the space between 2 and is would modify the output.

Neil

Posted 2017-12-20T02:44:49.140

Reputation: 2 417

Sandbox: https://codegolf.meta.stackexchange.com/a/14424/67929

– Neil – 2017-12-20T02:45:03.613

Sidenote: there is no program that can output the shortest equivalent of any arbitrary input program, per this discussion

– Leaky Nun – 2017-12-20T02:50:32.420

There do be some python minifier tools already. I dont think this question may receive better solution than the already exits tools.

– tsh – 2017-12-20T04:14:44.103

Is changing '1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111' into '1'*100 allowed? Require to do as the behavior is same? – l4m2 – 2018-03-20T12:33:50.273

Answers

2

Python 2.7, 2013 score

This program can be used as reference, and you are allowed to take the following code and modify it and then post it in your own solutions.

In hindsight, I maybe should have used regex for the quote handling too, but I think in its current state it may be enough to jump-start people into the problem.

Why I chose Python 2.7: I thought it would be easier to test to see if I made the program crash via the exec keyword.

This code intakes the program as in.txt.

I figured I should at least get the ball rolling for whoever want to participate by writing a quote parser (that also happens to handle comments) and a brief example on how regex, when combined with the quote parser can really change the game in terms of complexity of this problem.

Note: there is still plenty of room for improvement in this minifier. Like you could play around with indentation, variable names, and removing the parenthesis when they're being used my keywords, like print or yield.

import re

with open("in.txt","r") as fi:
    code = fi.read()

class QuoteHandler():
    def __init__(self):
        pass
    def loadCode(self,code):
        quoteFlag = False
        currentQuoteChar = ""
        ignoreNext = False
        inEndLineComment=False
        startLocation = 0

        self.reAddStrings = []

        outStr = ""

        for i, character in enumerate(code):
            if ignoreNext:
                ignoreNext = False
            elif inEndLineComment:
                if character in "\r\n":
                    inEndLineComment=False
            elif character == "#" and not quoteFlag:
                inEndLineComment = True
            elif character in "'\"" and (currentQuoteChar == character or not quoteFlag):
                if quoteFlag:
                    self.reAddStrings.append(code[startLocation+1:i])
                else:
                    currentQuoteChar = character
                    startLocation = i
                quoteFlag = not quoteFlag
            elif character == "\\":
                ignoreNext = True

            if not inEndLineComment and not quoteFlag:
                outStr+=character                
        return outStr

    def find_all_locations(self,substr,code):
        return [m.start() for m in re.finditer(substr, code)]

    def unloadCode(self,code):
        temp = self.reAddStrings[::-1]
        for i, location in enumerate(list(self.find_all_locations('"',code))[::-1]):
            code = code[:location] + "\"" + temp[i] + code[location:]
        return code

def applyRegexes(code):#\w here?
    operatorRegexCleaner = ["([\d\/*\-\"=,'+{}:[\](\)])","[ \t]+","(\w)"]
    regexes = [
        [''.join(operatorRegexCleaner),r"\1\2"],
        [''.join(operatorRegexCleaner[::-1]),r"\1\2"],#removes whitespace between operators
        ["\n\s*\n","\n"]#removes empty lines
    ]
    for regex in regexes:
        code = re.sub(regex[0],regex[1],code)
    return code

qh = QuoteHandler()
code = qh.loadCode(code)
code = applyRegexes(code)
code = qh.unloadCode(code)
print(code)
exec(code)

Output of program:

def factor(factor_number):
    for n in range(2,factor_number):
        if factor_number % n==0:    
            yield(n)
def gcd(a,b):
    """Calculate the Greatest Common Divisor of a and b.

    Unless b==0, the result will have the same sign as b (so that when
    b is divided by it, the result comes out positive).
    """
    while b:
         a,b=b,a%b 
    return a
class Apricot:
    def __init__(self):
        self.mold=False
    def get(self):
        return self.mold
    def update(self):
        self.mold=not self.mold
    def blue(self):return5
def tell_me_about_these_numbers(*a):
    print("%d is the first number!" % a[0])
    print("{} / 3 is {}".format(a[0],a[0]/3.))
    myFavorate=Apricot()
    for number in a:
        print list(factor(number))
        myFavorate.update()
    print[gcd(a,b)for a,b in zip(a[:-1],a[1:])]
    print(myFavorate.get())
tell_me_about_these_numbers(5,6,9,45,200)
print"Let's play with scope!"
a,b=10,9
def randomFunction(a):
    print(a)
randomFunction(b)
print(a)
for a in range(100):
    b+=a
print(a)
print(b)
li=[]
for i in range(10):
 li.append(i*2)
print(li)
print([i*2for i in range(10)])
a=c=b=d=e=f=g=h=i=j=k=l=m=n=o=p=q=r=s=t=u=v=w=x=y=z=5
print(a)
a-=1
print(a)
g=10
print(str(10**g+5)[::-1])
def blue_fish(a):
    def blue_fish(a):
        def blue_fish(a):
            return a
        a+=1
        return blue_fish(a)
    a-=1
    return blue_fish(a)
print(blue_fish(10))
def blue_fish(a):
    if a==0:
        return"0"
    return"1" +blue_fish(a-1)
print(blue_fish(5))
blue_fish=lambda a,b,c:a*b*c
print(blue_fish(1,2,3))
blue_fish=lambda*a:reduce(lambda a,b:a*b,a)
print(blue_fish(1,2,3))
print(max([[6,1],[5,2],[4,3],[3,4],[2,5],[1,6]],key=lambda a:a[1]))
print(zip(*[[1],[2],[3],[4],[5]]))
print"Now let's test to see if you handle quotes correctly:"
print"test \'many diffent\' \"types of \" quotes, even with \' \" trailing quotes"
print"""

Multi line quotes are great too!

"""
a=""" ::
one more multi-line quote won't hurt
"""
print a

Neil

Posted 2017-12-20T02:44:49.140

Reputation: 2 417