Golfing Python string literals

21

1

Background

Python 3 has many types of string literals. For example, the string this 'is' an exa\\m/ple can be represented as:

'this \'is\' an exa\\\\m/ple'
"this 'is' an exa\\\\m/ple"
r"this 'is' an exa\\m/ple"
'''this 'is' an exa\\\\m/ple'''
"""this 'is' an exa\\\\m/ple"""
r'''this 'is' an exa\\m/ple'''
r"""this 'is' an exa\\m/ple"""

As you can see, using different delimiters for strings can lengthen or shorten strings by changing the escaping needed for certain characters. Some delimiters can't be used for all strings: r' is missing above (see later for explanation). Knowing your strings is very useful in code golf.

One can also combine multiple string literals into one:

'this \'is\' an ''''exa\\\\m/ple'''
"this 'is' an "r'exa\\m/ple'

Challenge

The challenge is, given a printable ASCII string, to output its shortest literal representation in Python.

Details on string mechanics

Strings can be delimited using ', ", ''' and """. A string ends when the starting delimiter is hit again unescaped.

If a string literal starts with ''' or """ it is consumed as the delimiter. Otherwise ' or " is used.

Characters can be escaped by placing a \ before them. This inserts the character in the string and eliminates any special meaning it may have. For example, in 'a \' b' the middle ' is escaped and thus doesn't end the literal, and the resulting string is a ' b.

Optionally, one of r or R may be inserted before the starting delimiter. If this is done, the escaping \ will appear in the result. For example, r'a \' b' evaluates to a \' b. This is why a ' b cannot be delimited by r'.

To escape ''' or """, one only needs to escape one of the characters.

These literals can be concatenated together, which concatenates their contents.

Rules

  • The input is the string to golf. Printable ASCII only, so no newlines or other special characters.
  • The output is the golfed string literal. If there are multiple solutions, output one.
  • To simplify the challenge, in non-r strings any escapes except for \\, \' and \" are considered invalid. They must not be used in the output, even though '\m' is equal to '\\m' in Python. This removes the need to process special escape codes such as \n.
  • Builtins for golfing Python strings are disallowed. Python's repr is allowed, since it's crappy anyway.
  • Standard rules apply.

Example inputs/outputs

I tried my best to verify these, but let me know if there are mistakes. If there are multiple valid outputs to the cases, they are all listed below the input.

test
 -> 'test'
 -> "test"
te\st
 -> 'te\\st'
 -> "te\\st"
 -> r'te\st'
 -> r"te\st"
te'st
 -> "te'st"
te"st
 -> 'te"st'
t"e"s't
 -> 't"e"s\'t'
te\'st
 -> "te\\'st"
 -> r'te\'st'
 -> r"te\'st"
te\'\"st
 -> r'te\'\"st'
 -> r"te\'\"st"
t"'e"'s"'t"'s"'t"'r"'i"'n"'g
 -> """t"'e"'s"'t"'s"'t"'r"'i"'n"'g"""
 -> '''t"'e"'s"'t"'s"'t"'r"'i"'n"'g'''
t"\e"\s"\t"\s'\t"\r"\i"\n"\g
 -> r"""t"\e"\s"\t"\s'\t"\r"\i"\n"\g"""
 -> r'''t"\e"\s"\t"\s'\t"\r"\i"\n"\g'''
t"""e"""s"""'''t'''s'''"""t"""r"""'''i'''n'''g
 -> 't"""e"""s"""'"'''t'''s'''"'"""t"""r"""'"'''i'''n'''g"
t\"""e\"""s\"""'''t'''s'''\"""t\"""r\"""'''i'''n'''g
 -> r"""t\"""e\"""s\"""'''t'''s'''\"""t\"""r\"""'''i'''n'''g"""
t"e"s"t"s"t"r"i"n"g"\'\'\'\'\'\'\'\
 -> r't"e"s"t"s"t"r"i"n"g"\'\'\'\'\'\'\'''\\'
 -> r't"e"s"t"s"t"r"i"n"g"\'\'\'\'\'\'\''"\\"
"""t"'e"'s"'t"'s"'t"'r"'i"'n"'g'''
 -> """\"""t"'e"'s"'t"'s"'t"'r"'i"'n"'g'''"""
 -> '''"""t"'e"'s"'t"'s"'t"'r"'i"'n"'g''\''''

Thanks to Anders Kaseorg for these additional cases:

\\'"\\'\
 -> "\\\\'\"\\\\'\\"
''"""''"""''
 -> '''''"""''"""'\''''

PurkkaKoodari

Posted 2017-07-12T19:08:32.290

Reputation: 16 699

What about strings that starts or ends with " or ' -> """t"'e"'s"'t"'s"'t"'r"'i"'n"'g''' – Rod – 2017-07-12T19:13:52.850

@Rod I'll add that as a test case. – PurkkaKoodari – 2017-07-12T19:16:26.670

5Nice example of a good challenge with a language tag. – Adám – 2017-07-12T19:24:37.487

What about u' and b'? – caird coinheringaahing – 2017-07-12T19:41:24.790

@cairdcoinheringaahing They don't provide any useful features for golfing, and b can't even be combined with regular strings, so I just left them out. – PurkkaKoodari – 2017-07-12T19:43:02.530

If a String has multiple representations that are the shortest, should we choose one of them or output them all? – Mr. Xcoder – 2017-07-12T20:16:25.973

@Mr.Xcoder You should choose one. – PurkkaKoodari – 2017-07-12T20:16:48.877

Python 2 or Python 3? – CalculatorFeline – 2017-07-12T21:44:51.937

@CalculatorFeline Python 3. – PurkkaKoodari – 2017-07-12T21:45:13.147

Also, this might be better if scored by length on a set of cases and then code size, and allowing suboptimality. – CalculatorFeline – 2017-07-12T21:46:44.093

@CalculatorFeline I wanted to require optimality so that the results could even be useful in golfing. Also, since the optimal answers are somewhat easy to find, it would ultimately just reduce to this challenge. – PurkkaKoodari – 2017-07-12T21:48:20.393

+1 for "Python's repr is allowed, since it's crappy anyway." lol :P but nice challenge – HyperNeutrino – 2017-07-12T21:48:44.807

r'''t"\e"\s"\t"\s"\t"\r"\i"\n"\g''' can be r't"\e"\s"\t"\s"\t"\r"\i"\n"\g'. – CalculatorFeline – 2017-07-12T21:51:49.417

@CalculatorFeline Thanks, fixed the test case. – PurkkaKoodari – 2017-07-12T21:55:12.463

Answers

7

Python 3, 264 262 bytes

f=lambda s,b='\\',r=str.replace:min(sum([['r'+d+s+d,d+r(r(s[:-1],b,b+b),d,d[1:]+b+d[0])+b*(s[-1:]in[b,d[0]])+s[-1:]+d][d in r(r(s+d[1:],b+b,'x'),b+d[0],b)or r(s,b+b,'')[-1:]==b:]for d in["'",'"',"'''",'"""']],[f(s[:k])+f(s[k:])for k in range(1,len(s))]),key=len)

Try it online!

This works but is very slow without memoization, which you can add with

import functools
f=functools.lru_cache(None)(f)

It found an improved solution for one of the test cases:

t"e"s"t"s"t"r"i"n"g"\'\'\'\'\'\'\'\
 -> 't"e"s"t"s"t"r"i"n"g"'r"\'\'\'\'\'\'\'"'\\'
 -> r't"e"s"t"s"t"r"i"n"g"\'\'\'\'\'\'\'''\\'

Previous versions of this answer returned incorrect results on the following, which could be added as test cases:

\\'"\\'\
 -> "\\\\'\"\\\\'\\"
''"""''"""''
 -> '''''"""''"""'\''''

Anders Kaseorg

Posted 2017-07-12T19:08:32.290

Reputation: 29 242

1Nice work! Thanks for the test case, I've corrected it in the challenge. – PurkkaKoodari – 2017-07-12T23:02:14.543