Python 781 731 605 579 Chars
There are a lot more and much better answers from when I first saw this, but I did waste a lot of time on my python script so I am going to post it any way, it would be awesome to see suggestions to further shorten it,
Edit : thanks to Ed H's suggestions 2 chars chopped, to go further I might have to restructure a lot of thing here which is going to take some time
s="e |nd|-We| a|-(Ooh|N| what|ive| go|ay it-|I|er|G|o |make5 |D| th| othH |A| tF|ing |nna |tell|'s been|'rS|-You|-N4| know|L5 up|PR | you|evHK>| how I'm feeling-|O, g7)|O)9gL, n4gL-(G7)|-I just wa>=53Gotta EuRHstaR-.|Q've8n eachBfor sFlong:r heart<Pch?but:;toFshy@sJInsidSwSboth8M<K?onQ8CSgame6we;go>plJ|9g79let5 down9runProuR6desHt59Ecry9sayKodbye9=P lie6hurt5-|\n|Q;nFstrangHs@love:8CSrules6sFdFI-A full commitment'sM I'mCink?of: wouldn't getCis fromPnyBguy0/AR if5Psk me3Don't = me5;toFbliR@see-..2211-/0..";i=83
exec"x,s=s.split('|',1);s=s.replace(chr(i),x);i-=1"*39
print s
After the first time that I manually produced the string(very tedious), I wrote a function to recursively find the pattern replacing which was most profitable(at that step), which gave me a solution but it turned out to increase the size by 10 chars.
So, I made my algorithm a little less greedy by instead of doing the final ranking only on 'characters reduced', ranking on a function of 'characters reduced', 'length of pattern' and 'counts of pattern'
pattern length = length
count = count
rank = [(length-1)*count - length - 2] + lengthWeight * length + countWeight * count
Then I asked my poor laptop to run infinitely, assigning random values to lengthWeight
and countWeight
and get different final compression sizes, and store the data for minimum compression sizes in a file
In half an hour or so it came up with the above string (I tried to tinker with it further to see if I could shorten the code), and it won't go any lower, I guess I'am missing something here.
here's my code for it, also max_pattern
is very slow
(Note: code spits out a string similar to form in my previous version of the solution, I manually worked through it to get the current form, by manually I mean, manually in python shell)
import itertools
global pretty
global split
split = False
pretty = False
# try to keep as much visibility as possible
def prefrange():
return range(32,127) + ([] if pretty else ([10, 9, 13] + [x for x in range(32) if x not in (10, 9, 13)] + [127]))
def asciichr():
return [chr(x) for x in prefrange()]
def max_pattern(s, o, lenw, numw):
l = len(s)
patts = []
for c in range(l/2+1,1,-1):
allsub = [s[i:i+c] for i in range(0, l, c)]
subcounts = [[a, s.count(a)] for a in allsub if len(a) == c]
repeats = [(x, y, ((c-o)*y - o*2 - c)) for x, y in subcounts if y > 1]
ranks = [(x, y, (z + lenw*c + numw*y)) for x,y,z in repeats if z > 0]
patts = patts + ranks
try:
return sorted(patts, key=lambda k: -k[2])[0]
except:
return None
def sep():
return '~~' if pretty else chr(127) + chr(127)
def newcharacter(s):
doable = [x for x in asciichr() if x not in s]
if len(doable) == 0:
doable = list(set(x+y for x in asciichr() for y in asciichr() if x+y not in s and x+y != sep()))
if len(doable) == 0:
return None
return doable[0]
def joined(s, l):
one = [x for x in l if len(x)==1]
two = [x for x in l if len(x)==2]
return ''.join(reversed(two)) + sep() + ''.join(reversed(one)) + sep() + s
def compress(s, l=[], lenw=0, numw=0):
newchr = newcharacter(s)
if newchr == None:
if not l:
return s
return joined(s,l)
else:
ptn = max_pattern(s, len(newchr), lenw, numw)
if ptn == None:
if not l:
return s
return joined(s, l)
s = s.replace(ptn[0], newchr)
s = ptn[0] + newchr + s
l.append(newchr)
return compress(s, l, lenw, numw)
def decompress(s):
lst2, lst, s = s.split(sep(),2)
li = [lst2[i:i+2] for i in xrange(0, len(lst2), 2)]+list(lst)
for c in li:
x, s = s.split(c, 1)
s = s.replace(c, x)
return s
def test(times):
import random
rnd = random.random
tested = {(1001, 1001): (10000, 10, False),}
org = open('text').read()
minfound = 1000
for i in xrange(times):
l,n = 1001,1001
while (l,n) in tested:
# i guess this would be random enough
xr = lambda: random.choice((rnd(), rnd()+rnd(), rnd()-rnd(), rnd()*random.choice((10,100,1000)), -1*rnd()*random.choice((10,100,1000)),))
n = xr()
l = xr()
sm = compress(org, l=[], lenw=l, numw=n)
try:
dc = decompress(sm)
except:
tested[l,n] = (len(sm), len(sm)/float(len(org)), 'err')
continue
tested[l,n] = (len(sm), len(sm)/float(len(org)), dc==org)
if len(sm) < minfound:
minfound = len(sm)
open('min.txt','a').write(repr(tested[l,n])+'\n')
print '~~~~~~~!!!!!!! New Minimum !!!!!!!~~~~'
return tested
if __name__ == '__main__':
import sys
split = False
try:
if sys.argv[2] == 'p':
pretty = True
except:
pretty = False
org = open(sys.argv[1]).read()
try:
l=float(sys.argv[3])
n=float(sys.argv[4])
except:
l,n=0,0
sm = compress(org,lenw=l,numw=n)
print 'COMPRESSED -->'
print sm, len(sm)
#open('new.py','w').write(sm)
print len(sm)/float(len(org))
print 'TRYING TO REVERT -->'
dc = decompress(sm)
#print dc
print dc==org
I think there may be a typo in the pastebin text. In the second to last paragraph before the final chorus (first line): "We've know each other for so long" I think should be "We've known each other for so long" – Cristian Lupascu – 2012-05-29T06:55:53.963
4@w0lf You're correct. I only scanned the lyrics to verify their accuracy, I must've missed that one. I'll accept either "know" or "known". – Polynomial – 2012-05-29T07:30:35.077
1I would recommend adding a list of languages to be used (as a link to a list of popular or sth) because I have seen people w defining a language that prints "hello world" with 0 bytes of code :) – naugtur – 2012-06-01T11:45:22.400
Do you count newlines? – Josh Smeaton – 2012-06-01T14:36:06.253
Maybe next time it would be more fun to define "shortest" as "minimum lines of code", "minimum number of characters" or something like this. When people start hacking away at character encoding level, then the result is just not readable so easily, thus destroying some degree of fun. – erikbwork – 2012-06-01T15:11:40.597
Lines of code is pointless, because in most languages you can just stick everything into one line. If your solution is ASCII, count the characters excluding newlines. If your solution is non-ASCII (i.e. it uses character encoding to shorten the result) you count the bytes. – Polynomial – 2012-06-01T15:47:45.967
1026k+ views in just 3 days. WOW... – Gaffi – 2012-06-01T21:30:26.437
4It seems a bit counter-intuitive to have the source code of a non-Unicode-based language be measured as if it were UTF-8-encoded. But it's your problem, so you call the shots! – breadbox – 2012-06-02T02:45:04.950
9How in the world did this generate so many views and votes? Whatever he did, I'm going to reverse engineer it. – PhiNotPi – 2012-06-02T21:27:32.143
41@PhiNotPi Good luck reverse engineering "Jeff Atwood tweeting you". – breadbox – 2012-06-03T08:04:52.863
1
@PhiNotPi It looks like it was posted on Hacker News.
– Gareth – 2012-06-03T08:22:19.3803It got tweeted via the CodeGolf SE twitter account, then Jeff Atwood and Wrox Press tweeted it too. From there it got onto THN. I got a gold and two silver badges in the space of 5 minutes :) – Polynomial – 2012-06-03T11:24:14.487
1
I think Ed H.'s 555-byte solution is still the front-runner, actually.
– breadbox – 2012-06-07T07:47:25.767@breadbox So it is! Missed that one. – Polynomial – 2012-06-07T08:23:04.693
There seems to be some confusion about scoring wrt newlines ... You say "If your solution is ASCII, count the characters excluding newlines", so wouldn't that make Ed.H's solution score 554? (His solution has 556 bytes including a newline at the end of each of its two lines of code.)
– r.e.s. – 2012-06-07T12:23:37.673@r.e.s. I just copied the value from his post, I didn't manually check the length. You're right, it's 554, not 556. I've updated the question to reflect this. – Polynomial – 2012-06-07T12:28:23.517
Waaait a second. Newlines are not counted? If so, I can shave about 50 bytes off my solution with ease, so you might want to rethink that. – a sad dude – 2012-06-07T12:32:37.417
I'm talking about newlines that are part of code structure. You certainly can't ignore instances where you've used
\n
, or had newlines as part of the song's text. Also, I only see 14 newlines in your solution. – Polynomial – 2012-06-07T12:36:27.937Oh, ok. Still, if, say, newlines in ruby/python are not counted, how about not counting semicolons in php? ) wouldn't it be simpler to just always count the bytes? – a sad dude – 2012-06-07T12:45:20.440
ockquote>
Also, I only see 14 newlines in your solution.
...and 30 "q"s, plus about 20 more saved by str_split. Which I could easily change to newlines (and newlines to "q"). Because newline is just a character like every other, why treat it differently? – a sad dude – 2012-06-07T12:49:51.227
If the newlines later become part of the output, they count towards the size of your code. – Polynomial – 2012-06-07T12:50:29.637
+1 for actually accepting the shortest solution on a [code-golf] question. :-) – Gareth – 2012-06-08T16:13:32.973
I'd like to see a seed answer. – MCMastery – 2016-10-02T21:58:10.107
1Never gonna give PPCG up, Never gonna let PPCG down, Never gonna turn around and desert PPCG – Matthew Roh – 2017-03-25T02:20:25.353
1https://www.youtube.com/watch?v=63qtYi1nwcs – Steadybox – 2017-03-27T18:59:58.007
You can technically just make an infinite loop that prints a letter each time around. Eventually, the song will be I. There somewhere – jeremy – 2013-12-27T23:09:27.543
9The restriction with UTF-8 makes no sense. What if my code is shorter when encoded as UTF-16 or as Latin-1? You should either count the number of bytes or the number of characters, but leave the encoding up to the author. – Timwi – 2014-01-25T00:22:01.333
1Update, 1st June 2012: You suck. :P But seriously, everyone disagrees with that restriction. – MD XF – 2018-03-09T02:10:54.117
I love that there are 42 gonna-s in the song – seadoggie01 – 2018-09-05T14:08:24.683
_A function or program is what I'm thinking of. You wouldn't get this on any other site. I just want to tell you how I'm golfing. Gotta make you understand. Never gonna beat Dennis. _ – Lyxal – 2020-01-22T09:43:58.897