Note: The below makes sense only in the program is scored as characters, not as bytes.
I haven't seen this, though somebody may have posted it somewhere.
I needed to have some long literal ASCII strings in the code so somehow shortening them (as characters, not bytes) would be beneficial. After some experiments I came up with what I call the "Chinese reencoding". I call it that way because ASCII characters mostly seem to be squashed in unicode code points that represent Chinese characters. You take an ASCII string S, encode it in bytes as ASCII, and then decode it in UTF16-BE, like that:
E=S.encode().decode('utf16-be')
The resulting string is half the length. It has to be big endian, as the reverse reencoding may not work - and on most systems the shorter 'utf16' is little endian. You also may need to add a character like space if the original string has odd length, but many times this is OK. Also, for non ASCII characters this does not save length, because they result in too big unicode code points that are represented in the liong form ("\uXXXX")
In you code, use the following:
[E].encode('utf16-be').decode()
in order to get the original longer string, where [E] is the literal shortened string. This costs 29 additional characters, so the original string has to be longer than 58, obviously.
One example - below is my 12 days of Christmas (it can be shortened additionally, but let's use that as an example):
for i in range(12):print('On the %s day of Christmas\nMy true love sent to me\n%s'%('First Second Third Fourth Fifth Sixth Seventh Eighth Ninth Tenth Eleventh Twelfth'.split()[i],'\n'.join('Twelve Drummers Drumming,+Eleven Pipers Piping,+Ten Lords-a-Leaping,+Nine Ladies Dancing,+Eight Maids-a-Milking,+Seven Swans-a-Swimming,+Six Geese-a-Laying,+Five Gold Rings,+Four Calling Birds,+Three French Hens,+Two Turtle Doves, and+A Partridge in a Pear Tree.\n'.split('+')[11-i:])))
It's 477 characters long. Let's apply the "Chinese" trick to the two longer string:
r=lambda s:s.encode('utf-16be').decode();for i in range(12):print('On the %s day of Christmas\nMy true love sent to me\n%s'%(r('䙩牳琠卥捯湤⁔桩牤⁆潵牴栠䙩晴栠卩硴栠卥癥湴栠䕩杨瑨⁎楮瑨⁔敮瑨⁅汥癥湴栠呷敬晴栠').split()[i],'\n'.join(r('呷敬癥⁄牵浭敲猠䑲畭浩湧Ⱛ䕬敶敮⁐楰敲猠偩灩湧Ⱛ呥渠䱯牤猭愭䱥慰楮本⭎楮攠䱡摩敳⁄慮捩湧Ⱛ䕩杨琠䵡楤猭愭䵩汫楮本⭓敶敮⁓睡湳ⵡⵓ睩浭楮本⭓楸⁇敥獥ⵡⵌ慹楮本⭆楶攠䝯汤⁒楮杳Ⱛ䙯畲⁃慬汩湧⁂楲摳Ⱛ周牥攠䙲敮捨⁈敮猬⭔睯⁔畲瑬攠䑯癥猬\u2061湤⭁⁐慲瑲楤来\u2069渠愠健慲⁔牥攮ਠ').split('+')[11-i:])))
That's 362, including the lambda (it happens to be worth it, as it is used twice).
Now, all code is mostly ASCII characters, so you may have already guessed that you can use that with exec. There is higher overhead - 43 chars for "exec(''.encode('utf-16be').decode())" (in addition to the whole compressed program) and you may need to double escape some escaped characters in your literal strings (like '\n' in mine has to become '\n'). As a bonus you can always easily add that one space. The compressed porogram looks like:
exec("景爠椠楮\u2072慮来⠱㈩㩰物湴⠧佮⁴桥‥猠摡礠潦⁃桲楳瑭慳屮䵹⁴牵攠汯癥\u2073敮琠瑯\u206d敜渥猧┨❆楲獴⁓散潮搠周楲搠䙯畲瑨⁆楦瑨⁓楸瑨⁓敶敮瑨⁅楧桴栠乩湴栠呥湴栠䕬敶敮瑨⁔睥汦瑨✮獰汩琨⥛楝Ⱗ屮✮橯楮⠧呷敬癥⁄牵浭敲猠䑲畭浩湧Ⱛ䕬敶敮⁐楰敲猠偩灩湧Ⱛ呥渠䱯牤猭愭䱥慰楮本⭎楮攠䱡摩敳⁄慮捩湧Ⱛ䕩杨琠䵡楤猭愭䵩汫楮本⭓敶敮⁓睡湳ⵡⵓ睩浭楮本⭓楸⁇敥獥ⵡⵌ慹楮本⭆楶攠䝯汤⁒楮杳Ⱛ䙯畲⁃慬汩湧⁂楲摳Ⱛ周牥攠䙲敮捨⁈敮猬⭔睯⁔畲瑬攠䑯癥猬\u2061湤⭁⁐慲瑲楤来\u2069渠愠健慲⁔牥攮屮✮獰汩琨✫✩嬱ㄭ椺崩⤩".encode('utf-16be').decode())
and it's 299 characters long. You can see some high code points can always appear. I have not found a way to eliminate them, as the added handling code is not worth the benefit.
This is a cheap trick, in fact, but it can always be applied on top of your solution when the program is longish and there are no or few non-ASCII characters. Often you can devise a custom encoding that can stuff more than two ASCII chars in an unicode one, but it is specific for the task.
27Oh, I can see a whole set of questions like this one coming for each language... – R. Martinho Fernandes – 2011-01-28T04:26:45.060
4
@Marthinho I agree. Just started a C++ equivalent. I don't think its a bad thing though, as long as we don't see the same answers re-posted across many of these question types.
– marcog – 2011-01-28T12:28:11.2432Shouldn't this question be a community wiki post? – user8397947 – 2016-05-29T15:35:09.277
3@dorukayhan Nope; it's a valid [tag:code-golf] [tag:tips] question, asking for tips on shortening [tag:python] code for CG'ing purposes. Such questions are perfectly valid for the site, and none of these tags explicitly says that the question should be CW'd, unlike SO, which required CG challenges to be CW'd. Also, writing a good answer, and finding such tips always deserves something, that is taken away if the question is community wiki (rep). – Erik the Outgolfer – 2016-09-09T14:48:14.733
2Use Python 2 for golfing not 3 – Chris_Rands – 2017-08-09T15:04:33.713
@Chris_Rands That simply does not universally hold, as there are cases in which Python 3 allows for shorter submissions. – Jonathan Frech – 2018-07-15T14:36:53.527
@JonathanFrech Especially the new
:=
operator in 3.8 – MilkyWay90 – 2019-03-17T17:31:08.07751Love the question but I have to keep telling myself "this is ONLY for fun NOT for production code" – Greg Guida – 2011-12-21T00:08:53.377