Python's f-strings are actually safer. Use them!
String formatting may be dangerous when a format string depends on untrusted data. So, when using str.format()
or %
-formatting, it's important to use static format strings, or to sanitize untrusted parts before applying the formatter function. In contrast, f-strings aren't actually plain strings, but more like syntactic sugar for concatenating strings and expressions. As such, an f-string's format is predetermined and doesn't allow dynamic (potentially untrusted) parts in the first place.
Old-style formatting with str.format()
>>> data_str = 'bob'
>>> format_str = 'hello {name}!'
>>> format_str.format(name=data_str)
'hello bob!'
Here, your Python interpreter doesn't know the difference between a data string and a format string. It just calls a function, str.format()
, which runs a replacement algorithm on the format string value at the moment of execution. So, expectedly, the format is just a plain string with curly braces in it:
>>> import dis
>>> dis.dis("'hello {name}!'")
1 0 LOAD_CONST 0 ('hello {name}!')
2 RETURN_VALUE
New-style formatting with f-strings
>>> data_str = 'bob'
>>> f'hello {data_str}!'
'hello bob!'
Here, f'hello {data_str}!'
may look like a string constant, but it's not. The interpreter doesn't parse the part between {...}
as part of the string that may be expanded later, but as a separate expression:
>>> dis.dis("f'hello {name}!'")
1 0 LOAD_CONST 0 ('hello ')
2 LOAD_NAME 0 (name)
4 FORMAT_VALUE 0
6 LOAD_CONST 1 ('!')
8 BUILD_STRING 3
10 RETURN_VALUE
So, think of "hi {sys.argv[1]}"
as (approximately) syntactical sugar for "hi " + sys.argv[1]
. At run time, the interpreter doesn't even really know or care that you used an f-string. It just sees instructions to build a string from a the constant "hi "
and the formatted value of sys.argv[1]
.
Vulnerable example
Here is a sample web app which uses str.format()
in a vulnerable way:
from http.server import HTTPServer, BaseHTTPRequestHandler
secret = 'abc123'
class Handler(BaseHTTPRequestHandler):
name = 'funtimes'
msg = 'welcome to {site.name}'
def do_GET(self):
res = ('<title>' + self.path + '</title>\n' + self.msg).format(site=self)
self.send_response(200)
self.send_header('content-type', 'text/html')
self.end_headers()
self.wfile.write(res.encode())
HTTPServer(('localhost', 8888), Handler).serve_forever()
$ python3 example.py
$ curl 'http://localhost:8888/test'
<title>/test</title>
welcome to funtimes
Attack
When the res
string is built, it usesself.path
as part of the format string. Since self.path
is user-controlled, we can use it to alter the format string and e.g. exfiltrate the global variable secret
:
$ curl -g 'http://localhost:8888/XXX{site.do_GET.__globals__[secret]}'
<title>/XXXabc123</title>
welcome to funtimes