3

A web application I'm developing will be a Single Page Application (SPA) that will interact with a REST API backend, through jQuery.ajax() calls.

The SPA and API will both be served over an https/TLS connection. The API will be served from a subdomain of the SPA domain:

SPA: example.org
API: api.example.org

... and will respond with the appropriate CORS headers:

Access-Control-Allow-Origin: example.org
Access-Control-Allow-Methods: GET, POST, etc. // whatever applicable to the requested resource
Access-Control-Allow-Headers: Accept, Authorization, Content-Type

Upon logging into the SPA the user (an organisation) will be served its unique associated sha1 API-key (either in a cookie or as a global javascript variable), that the SPA will use for interacting with the API, for the duration of the login session of the user. The SPA will issue this API-key in each request to the API in an Authorization header:

Authorization: MyAppsApi apikey=<API-key>

The REST API's persistence storage will be XML based. I haven't decided on an actual storage mechanism vendor yet (considering using eXistdb, at the moment). In this early stage of development, however, I'm simply using PHP's DOMDocument and DOMXPath, with no concurrent read/write capabilities.

The REST API will furthermore dynamically generate XPath queries, based on the received request-URI's path. Communication between the SPA and API will probably be done in JSON, though.

Consider this example XML document:

<?xml version="1.0" encoding="utf-8"?>
<organisations>
  <organisation id="1">
    <apiKey>some hex sha1 digest</apiKey>
    <products>
      <product id="1">
        <parts>
          <part id="1">
            <subParts>
              <subPart id="1">
                ...
              </subPart>
            </subParts>
          </part>
        </parts>
      </product>
    </products>
  </organisation>
  <organisation id="2">
    <apiKey>another hex sha1 digest</apiKey>
    <products>
      ...
    </products>
  </organisation>
</organisations>

Currently, this document is validated by a custom XSD schema.

The REST API will first determine if an <organisation> node with the issued <apiKey> exists before interacting further with the XML. If the <organisation> node is found, it will be used as the context node for any further XPath queries.

The request-URI paths will be restricted by the following regex pattern:

~\G(/(?<collection>[a-z]+)(?:/(?<resourceId>\d+))?)(?:(?=(?1))|/?$)~

allowing only /<loweralpha>+(/<digit>+)? segments

Consider these example request-URI paths and their dynamically generated XPath:

/products/1          => .//products/*[@id="1"]
/parts/1             => .//parts/*[@id="1"]
/products/1/parts/1  => .//products/*[@id="1"]/parts/*[@id="1"]

As you can see, they will be relative to the <organisation> context node.

Considering that I haven't fully investigated the typical workings of XML backends yet, it may very well turn out that my above XML setup is flawed to begin with, in that I should create an XML document per organisation, mitigating the risk of accessing nodes that do not belong to the <organisation> context node.

However, do you see any inherent flaws in this current set-up?

In my current set-up I am mostly concerned about the dynamic XPath querying that could turn out to be too risky. Perhaps an adversary is able to sneak in XPath axes, somehow? But I'm interested to hear about any other possible flaws as well.

Thank you.

PS.: perhaps I should have clarified more what the risks are, that I am most concerned with:

  1. Can an adversary somehow obtain the API-key of an(other) organisations?
  2. Can an adversary somehow manipulate content of an(other) organisations?
  • 3
    To whoever close-flagged this as "too broad", a long question does not make a question too broad. There are some clear and answerable questions here: what vulnerabilities are potentially present in this use of XPath and XML. The one close reason I could understand is the one relating to breaking a specific question, but I think that's arguable either way. I consider this a well-enough written and fleshed out question to remain here. – Polynomial May 02 '16 at 18:53

1 Answers1

1

Your primary problem when handling client-provided is going to be XML External Entities (XXE) attacks. Systems with such vulnerabilities can often be exploited to read files or enumerate the internal network which the server is on. In PHP you can help fix this by calling libxml_disable_entity_loader(true); in order to disable external entities.

Another problem is, potentially, the Billion Laughs attack. This is a CPU and memory exhaustion DoS attack which uses nested element type declarations. This SO question should give you a good idea on how to pre-validate XML before loading it to avoid this kind of attack, but the short answer is that libxml allows you to set a custom DTD validator / loader callback.

You may also want to consider XPath Injection, though I'm not sure how critical this would be in your use-case. It's hard to tell what kind of impact it might have on your system's business logic without having a wider understanding of the application.

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • Great stuff! These are exactly the type of pointers I was hoping to receive to this question. Thanks! By the way, maybe I should have mentioned that I'm not entirely sure yet whether the SPA will receive/send XML as well. It could very well be that I will translate the XML to JSON and vice versa. But the persistence storage will remain XML. Another thing that is relevant to mention is that I'm currently validating the validity of the XML document with a custom XSD schema. I'll add this to my question. I'll investigate your suggestions further to see how they apply to my set-up. Thanks again! – Decent Dabbler May 02 '16 at 19:40
  • 1
    @DecentDabbler In regards to the two additional questions, it's not something anyone here can answer. It's something that would be investigated interactively during a penetration test against the application (in fact, I quite regularly see questions such as this raised in statements of work in my day job). Regarding the JSON part, it's a much safer format in terms of its use for data storage and transfer - XML is an immensely complex markup language and its use for data storage is the result of many a vulnerability. – Polynomial May 02 '16 at 19:42
  • 1
    @DecentDabbler That said, look into [object injection](https://www.owasp.org/index.php/PHP_Object_Injection) attacks and the [security issues around unserialising untrusted data](https://www.owasp.org/index.php/Deserialization_of_untrusted_data). Both of these are of critical importance if you're deserialising this data into objects, from either XML or JSON, without appropriate checks. Again, if possible, avoid this kind of behaviour where possible due to its complexity and tendency for security bugs - manually pull out fields from the input data and fill the objects. – Polynomial May 02 '16 at 19:45
  • About my additional questions: fair enough. However, I should mention that, as far as I'm concerned, they can just be seen in the context of the information that I have currently provided, leaving out of consideration other possible security risks such as system administration flaws, network configuration flaws, etc. Concerning your last remark about XML storage, other than the information you have already provided, do you know of any good sources (articles, etc.) that discuss why XML storage is so vulnerable? – Decent Dabbler May 02 '16 at 19:52
  • Thanks. I was aware of object injection and unserialising indeed. – Decent Dabbler May 02 '16 at 20:00
  • @DecentDabbler Just a quick Google search for "XML vulnerabilities" should give you an example, but the point is that there's so much to XML above and beyond its ability to store data in a nested structure. The more features and complexity you add, the more potentially vulnerable code there is. The fact that Wikipedia has [an entire category for XML](https://en.wikipedia.org/wiki/Category:XML) should give you an idea of how bloated the format and standard has become. – Polynomial May 02 '16 at 20:00
  • Will do! The critique about it being considered bloated, I was already aware of. And even though I kind of agree, for my intended application it appears to be more fitting than, for instance, a relational database. Anyway, I'll have to investigate eXistDB, or other XML backends as well, to see what they have to say/to offer in terms of mitigating security risks. In any case, thanks again for your valuable insights! – Decent Dabbler May 02 '16 at 20:10