I don't think it would be a wise idea to give how-to's on XSS attacks; though many resources exist e.g. Ch. 12 of the WAHH on XSS.
The best way to prevent XSS attacks is to substitute every special character used in these attacks (e.g., <
, >
, &
, "
,'
with their html character equivalent <
, >
, &
, "
, '
) for all user-provided text and only allowing users to markup their comments using a limited non-html markup language (like markdown) that only gets substituted into a safe subset of html at the last step of processing. (Also be careful their comments never get executed in any sort of javascript processing you wrote).
If you choose to ignore this, and only sanitizing specific tags (like <script>
, <object>
, javascript
check the sanitation is being done recursively (so <scr<script>ipt>
doesn't become <script>
after a single-pass sanitation), is case/white-space insensitive, and only stops when the last complete sanitation attempt did not change the input. In python something like
import re
def sanitation_single_pass(user_input):
pattern = re.compile('<[^>]*(script|object|meta|style)[^>]*>', re.IGNORECASE)
# this is very quick sample regex that could appear in a sanitation routine
# not meant to be inclusive of most XSS threats;
# e.g. this doesn't prevent having javascript in links or img src, etc.
return pattern.sub('', user_input)
def sanitation(user_input):
processed_user_input = sanitation_single_pass(user_input)
while processed_user_input != user_input:
user_input = processed_user_input
processed_user_input = sanitation_single_pass(user_input)
return processed_user_input
Also pay attention to encoding issues, make sure you define a charset (<meta charset="utf-8">
at the top of your html templates) and that you force user input into this encoding before sanitization. Also try to recognize that some browsers will interpret things like java
script
as javascript
(

is a line break), etc.