There's a web application on a server which I have full access to which accepts POST requests on a REST endpoint. The request payload is expected to be an XML document. For request routing and load balancing the server makes use of HAProxy, which also takes care of the TLS settings and has the certificate chain and private key in its setup.
Recently some posts have been failing. The response has HTTP status code 200 but its payload is an HTML document rather than the expected XML that the service ought to return. It takes this form:
<html>
<head>
<title>Request Rejected</title>
</head>
<body>The requested URL was rejected. Please consult with your
administrator.<br><br>Your support ID is: [redacted]<br><br><a href='javascript:history.back();'>[Go
Back]</a></body>
</html>
What is curious is that this only occurs if the XML has certain character entity references in its contents. I've tested and established these conditions:
- The call fails with the above response if entity reference

(carriage return) or

(newline) is present in the XML payload, regardless of which or in what combination. - It does not fail if we replace these with decimal instead of hexadecimal references (
and
). - I see no logging for the call on either the web service or even in the HAproxy logs, while the logging is set up so that if the POST made it to the server we'd expect to see at least something in HAproxy.
- If the POST is replicated with curl on the server itself, bypassing HAproxy and external networks, it succeeds.
- Trying the call with Postman and the "problematic" XML, I can see that the certificate for the TLS handshake is the one set up in HAproxy, so no replacement from a MITM node.
The first thing I'm wondering, of course, is why on earth a POST would get blocked because of some perfectly fine entity references in it. It's not an XML bomb, the XML is well-formed and I don't see an issue with the HTTPS POST itself. Looking online for that particular HTML response it seems like a firewall is the most likely culprit.
But that raises a second question. How could some node along the network block the POST based on its contents if it's performed over HTTPS? The payload ought to be encrypted. And the TLS connection is established between the client and HAproxy on the server.
EDIT: We contacted the hosting provider who confirmed there was an issue with RPaaS. They've made a change and the requests get through now. Which still leave the question of how a connection over HTTPS could have requests blocked based on content.