12

Background

A while ago I started using F strings in Python but remembered seeing some security concerns with using them with user input so I have made a point of not using them for those situations.

Question

Are there security concerns to using python f strings with user input. For example, is it possible for a user to gain access to information that they shouldn't have access to.

Example

$ ./hello.py mike
hi mike
#!/usr/bin/python3

import sys

secrete = 'my secrete'

print(F"hi {sys.argv[1]}")

This program is basically a basic hello world that takes user input. Is it possible for an attacker to supply an input that would exfiltrate the secrete variable or any other valuable data?

MikeSchem
  • 2,266
  • 1
  • 13
  • 33
  • It depends entirely on what you do with it. Your new string contains user input so if, for example, you pass it off to `os.system` you now have RCE. – Conor Mancone Sep 15 '20 at 18:31
  • ok, so I'm aware of that, but that would be the same with any user input method. Since the F string is computed at runtime, I'm wondering if you can force it to eval a statement like if I passed in `1+1`. I did try that and it just interprets it as a string so no problem there. – MikeSchem Sep 15 '20 at 18:57

3 Answers3

18

Python's f-strings are actually safer. Use them!


String formatting may be dangerous when a format string depends on untrusted data. So, when using str.format() or %-formatting, it's important to use static format strings, or to sanitize untrusted parts before applying the formatter function. In contrast, f-strings aren't actually plain strings, but more like syntactic sugar for concatenating strings and expressions. As such, an f-string's format is predetermined and doesn't allow dynamic (potentially untrusted) parts in the first place.

Old-style formatting with str.format()

>>> data_str = 'bob'
>>> format_str = 'hello {name}!'
>>> format_str.format(name=data_str)
'hello bob!'

Here, your Python interpreter doesn't know the difference between a data string and a format string. It just calls a function, str.format(), which runs a replacement algorithm on the format string value at the moment of execution. So, expectedly, the format is just a plain string with curly braces in it:

>>> import dis
>>> dis.dis("'hello {name}!'")
  1           0 LOAD_CONST               0 ('hello {name}!')
              2 RETURN_VALUE

New-style formatting with f-strings

>>> data_str = 'bob'
>>> f'hello {data_str}!'
'hello bob!'

Here, f'hello {data_str}!' may look like a string constant, but it's not. The interpreter doesn't parse the part between {...} as part of the string that may be expanded later, but as a separate expression:

>>> dis.dis("f'hello {name}!'")
  1           0 LOAD_CONST               0 ('hello ')
              2 LOAD_NAME                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               1 ('!')
              8 BUILD_STRING             3
             10 RETURN_VALUE

So, think of "hi {sys.argv[1]}" as (approximately) syntactical sugar for "hi " + sys.argv[1]. At run time, the interpreter doesn't even really know or care that you used an f-string. It just sees instructions to build a string from a the constant "hi " and the formatted value of sys.argv[1].

Vulnerable example

Here is a sample web app which uses str.format() in a vulnerable way:

from http.server import HTTPServer, BaseHTTPRequestHandler

secret = 'abc123'

class Handler(BaseHTTPRequestHandler):
    name = 'funtimes'
    msg = 'welcome to {site.name}'
    def do_GET(self):
        res = ('<title>' + self.path + '</title>\n' + self.msg).format(site=self)
        self.send_response(200)
        self.send_header('content-type', 'text/html')
        self.end_headers()
        self.wfile.write(res.encode())

HTTPServer(('localhost', 8888), Handler).serve_forever()
$ python3 example.py

$ curl 'http://localhost:8888/test'
<title>/test</title>
welcome to funtimes

Attack

When the res string is built, it usesself.path as part of the format string. Since self.path is user-controlled, we can use it to alter the format string and e.g. exfiltrate the global variable secret:

$ curl -g 'http://localhost:8888/XXX{site.do_GET.__globals__[secret]}'
<title>/XXXabc123</title>
welcome to funtimes
Arminius
  • 43,922
  • 13
  • 140
  • 136
4

If this basic language feature were this flawed, it probably wouldn't be a feature at all. As long as the contents of the format string are controlled by the programmer at development time, there is nothing that a user can do to abuse them.

The contents of the curly braces are evaluated, but the result of that evaluation is not evaluated again (i.e. sys.argv[1] is evaluated to "1+1", but is not evaluated again, like you have seen).

The problem arises when a user is able to to inject data into a string before it is formatted; see this challenge example. While this is not for an f-string, it is a good demonstration of the attacks that are possible if the user is allowed to control the formatting.

multithr3at3d
  • 12,355
  • 3
  • 29
  • 42
1

Take a look at the following simple example:

import sys

secret = "My secret"

print(f"From argv: {sys.argv[1]}\n")
print(f"From code: {print(secret)}")

If you run it with python test.py print\(secret\) or python test.py "print(secret)" the result is:

From argv: print(secret)

My secret
From code: None

The argument is simply treated as a string, it is not executed. However, I am not 100 % sure there is no way to force Python to somehow execute it. I am also not sure what would happen if the data came from some another input, like socket for instance.

MikeSchem
  • 2,266
  • 1
  • 13
  • 33
Al Bundy
  • 121
  • 4