If an XML document is not validated as "Well Formed" or checked against a schema, what are the risks?

Question

When processing an XML document in my application, what are the risks? E.g. if it is not "Well Formed" or is not checked against a schema.

score 8 · Accepted Answer · answered Nov 17 '10 at 15:37

8

Even if you have a "Well Formed" XML document, that does not prevents from attack - injection not always breaks XML document. To prevent XML injection attacks following measures should help:

check for valid XML Schema Definition;
validate/sanitize input;
check and enforce encoding;

For my answer completeness I want to add several useful links:

answered Nov 17 '10 at 15:37

1

Hi @Ams, correct but not yet complete: also need to forbid DTD (depending on platform this might be enabled by default), and deactivate processing of external references. – AviD Nov 17 '10 at 20:21
Many thanks guys, but I didn't totally understand why I need to forbid DTD? – Phoenician-Eagle Nov 17 '10 at 20:54
@Paul, this link can help answer: http://msdn.microsoft.com/en-us/library/cc838174(v=VS.95).aspx – Nov 17 '10 at 21:22

score 6 · Answer 2 · answered Jun 01 '11 at 01:29

Secure XML processing using DTD's and XSD's is tricky.

You should ensure that the correct dtd's and xsd's are referenced for your use case before processing the xml file with a parser (and that mixed xml content is not added such as alternate xmlns, local dtd definitions in the xml, Entity expansions etc).

As I heard on an OWASP podcast OWASP Podcast downloads here, and is particularly relevant in this context, white list your accepted data (xml content), never blacklist your known problems with that content.

Turning off external references is great (think about someone reading your /etc/passwd or /etc/shadow file by referencing it using the file:// protocol instead of the dtd).

You can use a resolver and a Catalog file to control / replace external references with known good local copies that cannot be subverted http://xml.apache.org/commons/components/resolver/resolver-article.html

You can use an extrinsic validation program/library such as the multi schema validator by Sun/Oracle. http://msv.java.net/ this can provide validation even when there is nothing internally to validate, and can use a different/complementary technology such as RELAX NG to validate your xml.

Be careful about all sorts of injection (SQL, Javascript, xmlns, image, svg, url, xslt, xpath etc) because they can all potentially be injected and transmitted to a context within which they activate and become a danger to your db server, app server or your client environments. Consider a base64 encoded image with an IE compromise that is transmitted into a web-page inside your infrastructure (game over).

Denial of Service on your xml processing infrastructure can also be a worry, but may not be relevant to your system.

Note: @anonymous has provided some great url's for relevant resources.

An extra noteworthy problem can be xml includes of various types. — Andrew Russell, Jun 01 '11 at 03:15

score 5 · Answer 3 · answered Nov 18 '10 at 21:43

The primary risk in not syntax checking your XML is invalid parsing.

If the software reading the XML can't handle invalid input, it might crash, do something unexpected, spontaneously explode (probably not), etc. Those situations can lead to security flaws - but if the software is brittle enough not to be able to handle invalid XML, it will very likely have other security flaws, potentially even in the "valid" data.

As an analogy, most web app security holes (e.g. SQL injection) aren't attacked using invalid HTML, but syntactically valid input which causes problems when parsed. In your case, the XML is the input. Schema checking is rarely sufficient to validate the input, especially if the XSD/DTD/whatever was auto-generated. Whatever processes the input in the application itself needs to check it too.

If an XML document is not validated as "Well Formed" or checked against a schema, what are the risks?

3 Answers3