Django does the sensible things to reduce exposure to XSS.
Django uses unicode and UTF-8 encoding everywhere by default, and sensibly forces unicode encoding before doing substitution on all template variables (done by default) to prevent users inserting arbitrary HTML elements. Django allows developers to change the encoding with the DEFAULT_CHARSET
setting, but will force that encoding throughout the application and will insert Content-Type: text/html; charset=utf-8
HTTP response headers by default (with 'text/html' and 'utf-8' changing if you are returning a different content_type or changed the charset). Furthermore, django pages will also set <meta http-equiv="content-type" content="text/html; charset=utf-8">
in their base templates and their admin pages, but again gives developers the option to not use their base templates (and the devs custom written templates may not define a charset in the meta tag or worse may use the wrong charset). So while bobince's great answer listed some shortcomings of substitute <
for <
in user input via encoding issues; django by default will handle these properly.
Is it 100% fool-proof? No, they still give the developer enough configurability to do unsafe things like insert user-input into a onclick action, bypass the automatic escaping (through mark_safe()
function or {{ user_input|safe }}
in the template), or allow user input into an unsafe location: e.g., a link or within eval'd javascript. Granted it would be near impossible to do much more without intensive compiling/semantic analysis of each template.
For people interested, the escaping code is quite readable in django/utils/html.py. (My link goes to the current dev version; but my copy paste is from django 1.2. The main difference between the dev and 1.2 version is they renamed force_unicode
to force_text
(in py3 all text is unicode) and made it compatible with python 3 (all the references to six).)
Basically, the escape function is run on every variable to be rendered in the template and first checks that it can be encoded properly and then replaces the characters: &<>'"
with their HTML-escaped equivalents. There is also functions for escaping JS, though I believe that has to be manually called in the template like {{ variable|escapejs }}
.
def escape(html):
"""
Returns the given HTML with ampersands, quotes and angle brackets encoded.
"""
return mark_safe(force_unicode(html).replace('&', '&').replace('<', '<').replace('>', '>').replace('"', '"').replace("'", '''))
escape = allow_lazy(escape, unicode)
_base_js_escapes = (
('\\', r'\u005C'),
('\'', r'\u0027'),
('"', r'\u0022'),
('>', r'\u003E'),
('<', r'\u003C'),
('&', r'\u0026'),
('=', r'\u003D'),
('-', r'\u002D'),
(';', r'\u003B'),
(u'\u2028', r'\u2028'),
(u'\u2029', r'\u2029')
)
# Escape every ASCII character with a value less than 32.
_js_escapes = (_base_js_escapes +
tuple([('%c' % z, '\\u%04X' % z) for z in range(32)]))
def escapejs(value):
"""Hex encodes characters for use in JavaScript strings."""
for bad, good in _js_escapes:
value = mark_safe(force_unicode(value).replace(bad, good))
return value
escapejs = allow_lazy(escapejs, unicode)
def conditional_escape(html):
"""
Similar to escape(), except that it doesn't operate on pre-escaped strings.
"""
if isinstance(html, SafeData):
return html
else:
return escape(html)
and from django/utils/encoding.py:
def force_unicode(s, encoding='utf-8', strings_only=False, errors='strict'):
"""
Similar to smart_unicode, except that lazy instances are resolved to
strings, rather than kept as lazy objects.
If strings_only is True, don't convert (some) non-string-like objects.
"""
if strings_only and is_protected_type(s):
return s
try:
if not isinstance(s, basestring,):
if hasattr(s, '__unicode__'):
s = unicode(s)
else:
try:
s = unicode(str(s), encoding, errors)
except UnicodeEncodeError:
if not isinstance(s, Exception):
raise
# If we get to here, the caller has passed in an Exception
# subclass populated with non-ASCII data without special
# handling to display as a string. We need to handle this
# without raising a further exception. We do an
# approximation to what the Exception's standard str()
# output should be.
s = ' '.join([force_unicode(arg, encoding, strings_only,
errors) for arg in s])
elif not isinstance(s, unicode):
# Note: We use .decode() here, instead of unicode(s, encoding,
# errors), so that if s is a SafeString, it ends up being a
# SafeUnicode at the end.
s = s.decode(encoding, errors)
except UnicodeDecodeError, e:
if not isinstance(s, Exception):
raise DjangoUnicodeDecodeError(s, *e.args)
else:
# If we get to here, the caller has passed in an Exception
# subclass populated with non-ASCII bytestring data without a
# working unicode method. Try to handle this without raising a
# further exception by individually forcing the exception args
# to unicode.
s = ' '.join([force_unicode(arg, encoding, strings_only,
errors) for arg in s])
return s