These are things I do when users submit data:
substr
if extra characters found.htmlspecialchars()
+ENT_QUOTES
+ UTF-8str_replace
'<' '>' in user input
What more things need to be done?
These are things I do when users submit data:
substr
if extra characters found.htmlspecialchars()
+ ENT_QUOTES
+ UTF-8str_replace
'<' '>' in user inputWhat more things need to be done?
“Sanitisation” is an unhelpful and misleading term. There are two different animals here:
Output escaping. This is an output-stage concern. When you take variable strings and inject them into a larger string that has a surrounding syntax, you must process the injected string to make it conform to the requirements of that syntax. What exactly that processing is depends on the context: if you are putting text in HTML, you must HTML-escape that text at the point of making the HTML. If you are putting text in SQL queries, you must SQL-escape the text at the point of creating the query.(*)
Input validation. This is an input-stage concern, making sure that user input is within the accepted possible values for a data item. This is primarily a matter of business rules, to be considered on a field-by-field basis, although there are some kinds of validation that it makes sense to do to almost all input fields (primarily checking for control characters).
Input validation does have security impact in that it can mitigate the damage when you've made a mistake with your output escaping. But it is not enough to rely on input validation as your only text-handling measure because you're always going to need to allow the user to use some characters that are special in some syntax or the other. You're going to want to be able to have a web page about fish & chips
and a customer in your database called O'Reilly
.
“Sanitisation” confuses these two concepts and encourages you to address them at the same stage, which can never work consistently. A common anti-pattern is to HTML-escape all your input. But you don't know if each input element is going to be output to HTML (and only output to HTML) at that input processing phase. If you do this:
you end up with HTML-encoded material in the database, that can't be cut up and processed without the entity references getting in the way;
if you need to create content from that data that isn't HTML, like send an e-mail or write some CSV, you've got ugly mangled text in it;
if you get content in your database from any other source it might not be HTML-escaped and so outputting it straight to the page still gives you XSS vulnerabilities.
“Sanitisation” as a concept should be destroyed by fire, then drowned, cut into little pieces and destroyed by some more fire again.
(*: in both cases it is wiser to choose a method that does the processing for you implicitly so you don't get it wrong: use an HTML templating language that escapes output by default, and a data access layer that uses parameterised queries or object-relational mapping. Similarly for other kinds of escaping: prefer a standards-compliant XML serialiser to manual XML escaping, use a standard JSON serialiser to pass data to JavaScript, and so on.)
substr if over limited values found.
Do you mean truncating too-long input strings? That's OK as a form of input validation where your business rules have valid reason to limit the length of an input. But you might prefer returning an error to the user if you have a too-long input string, as depending on what field it is it might not be appropriate to quietly discard data.
htmlspecialchars() + ent_quotes + UTF-8
This is output escaping. Do it on the values at the point you drop them into HTML, not on input. If you are using native PHP templating you may like to define yourself a shortcut to make it quicker to type, for example:
function h($s) {
echo htmlspecialchars($s, ENT_QUOTES, 'UTF-8');
}
...
<p>Hello, <?php h($user['name']); ?>!</p>
str_replace
<
>
users input
What for? If you are HTML-escaping correctly, these characters are perfectly fine, and unless your business rules says otherwise may be quite valid to include in a field—just as both characters are valid for me to type in this comment box for SO.
Of course you may want to disallow them in input validation for specific fields—you wouldn't want them in a phone number or email address.
I use the OWASP PHP Filters. They're really simple to use and effective.
The source code is highly readable. There are a lot of painful lessons in there.
Since this is an issue from a number of years ago, some things change and external links generally fold as sites don't maintain or address links that may exist in other sites.
So moving on, PHP has moved on a bit and many people ask about sanitizing inputs but as yet, the use of filter_var
is thin on the ground, whilst not perfect it is from my reading, binary safe.
So you get an email address, well unless you don't use HTML5 when you should be using it in conjunction with PHP filter_var
, your site will be more secure than someone writing a routine to sanitize an input who doesn't use HTML5 inputs. Writing code for backwards compatibility for non HTML5 compliant browsers is completely pointless and a waste of your resources and time.
The other issue of security is that the values of $_GET and $_POST are volatile and can change or be changed externally from good data to bad data, therefore any sanitize routine that uses them and passes back cleaned inputs in to them is just ripe for trouble... $_REQUEST array is safer, it once set in your safe array, it can't be changed, so populate your safe array by taking inputs & filter_var them in to the safe array.
How I sanitize inputs is something like what follows...
$someSafeArray = array(
"thefield"=>FILTER_SANITIZE_STRING,
"theNumberfield"=>FILTER_SANITIZE_NUMBER,
"theEmailfield"=>FILTER_SANITIZE_EMAIL
);
foreach( $someSafeArray as $fld=>&$val)
$val = filter_var( trim( $_REQUEST[$fld] ), $val );
So this will return all the fields (from the keys) and the sanitized inputs are then put in the values of those keys in the safe array.
This means that I use the keys of a white-list (array) to ONLY take the inputs I designate as being valid fields. Too many people I have seen offering up "Dynamic" form processors that accept ANY input, NO!!! You should only accept data streams that your code / form is designed to handle.
SALT your page with a value that your receiving form can recalculate the correct hashing to check that your form was issued by the server, EMPTY fields, I include at least one blank firld that is readonly, hidden like hashing fields but the intention is to determine if the form is being pushed or not, a bot will fill all fields with data to try and crack the page open.
SO Baiting your page with a couple of dummy fields like...
<input name="userlogin" type="hidden" value="" readonly />
<input name="empty" type="hidden" value="" readonly />
if the form arrived on your server with something in the value field of either input, you may as well cease any form processing and log the user IP and block them as they are either a bot or a hacker.
Injection is not only a SQL issue, it is a PHP page issue, so being careful on what fields you accept, what to salt
and bait
your form with and operate a white-list.
STOP USING GET's to pass control parameters, USE a session cookie as this reduces the inputs in to the script, If I use a GET type URL then its only for a subversive tactic and allows monitoring of users poking variables in to the URL and other stuff to try and hack.
I have been using a process like this since before the filter_var function was introduced, I was salting pages without the need for a database to validate incoming pages and it was something that I was repeatedly told by so called professionals wasn't possible, well the only thing I have to say to that is that "it is if you are able to think outside of the boiler plate. (box)" and simple enough to thwart hacking attempts, secure your form pages.
I would personally never str_replace on <
and >
, just strip tags, html special chars, html entities encoding, mysql_real_escape_string etc on user input.
What you need to take into account is how the data is going to be represented?
If it's going into the front end, then you need to htmlentities it and strip_tags imo, that way you can be sure that they aren't trying to execute any unwanted code.
Also, stripping slashes is quite a big consideration, I recently caught an XSS in the WP Platinum SEO plugin which you could execute javascript code through the $_GET['s'] parameter by encoding everything into escaped-hex code (\\x41 = A).
If you are entering data into the database, have a look at PDO prepared queries as well as mysql_real_escape_string. This should secure your database inputs fairly well.
If you are using user input to request files, make sure that it's not susceptible to Poison Null Byte attacks and in my opinion, always strip all slashes on file includes, to ensure they can't access the location desired. I would also recommend turning off allow_url_include / allow_url_fopen in your php.ini file.
I hope this helps!