HTTPS from web server in China is blocked by RST TCP packets (Great Firewall?)

Question

I'm hoping someone can provide some insight to a weird issue we're having as of 3 February 2019.

TL;DR

HTTPS sites on an IIS server in China are returning TCP RST packets after the initial TLS handshake.
The sites are showing "connection reset" errors to clients outside of China. * The same sites are accessible from within China over HTTPS.
Proxying the connection with CloudFlare, for DNS and to terminate SSL, reverses this issue (accessible only from outside China, connection reset from within).
a .CN domain on the same server can serve HTTPS outside of China, using the wildcard cert for the .COM domain (after accepting the invalid cert).

Background

We have a Windows web server (2008R2, IIS7.5) in AliYun Cloud (like a Chinese AWS, so this box is a VM like an EC2 instance). The IIS server hosts several sites on subdomains for which we have a wildcard certificate (e.g. https://app.example.com/ and https://api.example.com, and our wildcard is for *.example.com ). We recently updated that certificate, by installing a fully chained PFX file as we normally do. Testing the sites immediately afterward, everything was normal, and the HTTPS sites were served with the new certificate.

Shortly after this (like, a day later), HTTPS stopped working as expected. Clients connecting from outside of China would receive an error after the TLS handshake, indicating that the server had reset the connection. The same sites would load perfectly normally from the server itself, or any other location within China from which we tested. Any location outside of China from which we tested received the same connection-reset errors.

Troubleshooting and Testing

Rolling back the wildcard certificate to the previous one (even though it was soon to expire) did not affect the issue at all. Additionally, the renewed certificate was recognized as valid by the clients in China, our TLS version and ciphers showed up as OK in the browsers, etc. IIS and SChannel on the server reported no issues -- in fact, the failed connections did not even show up in the IIS logs.

We double-checked the bindings in IIS (all correct and using the updated cert), Windows firewall settings (not enabled), certificate properties (fully chained and including friendly name, correct SAN, etc). We combed through our TLS settings for version and ciphers, e.g. with IISCrypto and registry edits, and all were up to date, as far as .NET4.0 can support.

None of those settings affected the symptom of being able to connect to HTTPS sites on the server from within, but not outside of, China.

Research

I ran most additional tests from a computer in New York.

with telnet, we are able to connect to the server on port 443, ruling out a straightforward network firewall rule based on TCP port.
with traceroute, we see timeouts once the request gets into China, but nothing crazy -- and as usual, plain text HTTP works normally from anywhere.
with nslookup, the resolution and name servers are right where they should be
tcpdump -vv -i any host x.x.x.x and port 443 gave a reasonably interesting packet capture: it shows the RST packets showing up after the TLS handshake / Client Hello, in lieu of cipher negotiation or any payload: screenshot from Wireshark view of pcap file obtained through tcpdump, with server IP removed
(edited to add:) Packet capture on the server shows similar patterns: RST packets received -- ostensibly from the client -- immediately after the TLS handshake and Client Hello.

When I enabled CloudFlare on the domain to proxy DNS and terminate SSL (i.e. to have the origin server in China serve to CloudFlare via plain HTTP on port 80, but use CloudFlare's shared SSL to the clients (aka "Flexible" SSL in CloudFlare's plan)), the symptoms were reversed -- only clients external to China would see the HTTPS sites, while clients within China, including the local server, would see a connection reset at the HTTPS URL.

We have a .CN domain pointed to the same IIS server as the example.com subdomains. When visiting via that domain -- e.g. https://example.cn/ -- the connection loads as expected (you must accept that the SSL cert in use is the wildcard for *.example.com, and then you can load the site with a warning). The RST packets also don't appear in packet captures. For the record, the .CN domain gives nearly identical results in nslookup, traceroute, etc.

Concluding Questions

To me, it looks like the so-called "Great Firewall" is at work, i.e. forging and injecting RST packets to this connection. The RST packets don't follow the exact same patterns described in Weaver et al or Clayton et al, but they are pretty close in each case. Does this make sense? If so, is there any other test we could do to conclusively show that this is the case? (edited for question clarity)

I don't have access to the cloud "dashboard" for hosting for this machine, but a colleague is checking on that, in case there's some network-level issue we could address that way. Anything we should check for, in particular, there?

We do have an ICP number that can be applied if needed, for our .CN domain. (edited for clarity)

Obviously we'd like to be able to serve our sites at their .COM domains, to visitors both within and outside China, over HTTPS, from our existing server, using our wildcard cert, as we did before. What should we do?

"we'd like to be able to serve our sites at their .COM domains, to visitors both within and outside China, over HTTPS, from our existing server, using our wildcard cert, as we did before". Better make no more such assumption and only serve your non-China clients from a non-China location. — Lex Li, Feb 06 '19 at 20:50
Hi @LexLi. To clarify, are you suggesting that this issue is not fixable? — lewis levenberg, Feb 06 '19 at 22:27
That shield is a moving target with its own evolution and no documentation, so it is pointless to discuss "fixable" or not. — Lex Li, Feb 06 '19 at 22:30
Even if that's the case, would you agree that there is some value in determining whether the issue is indeed a "Great Firewall" problem as opposed to some sort of server or network configuration on our end? — lewis levenberg, Feb 07 '19 at 14:33

parkamark · Accepted Answer · 2019-02-08T09:41:42.617

Is it the case that when any payload TCP packet is sent, a RST is sent back? Or does it specifically have to be a TLS / Client Hello? ie. if you type something after a telnet connection is established, does the connection then close/reset?

Many years ago, I worked for a company that was developing technology employing similar mechanisms. From a private individual perspective, whilst I was still an employee there, I was able to come up with a work around that filtered/delayed such packets when they were sent to the client, effectively disrupting such TCP douchebaggery, but I observed that they had a little trick up their sleeve which made my work around methodology completely redundant. Basically, via custom configuration they had control of, they could reset the TCP connection on both sides, client and server, by injecting packets towards both the originating requesting client and destination server, so even with the filtering I came up with, the remote server was also messed with and the stream disrupted. If this is what is happening here, I don't hold much hope for the existence of any work around for it.

I realise this isn't a useful answer, but what I wanted to type above didn't fit as a comment. Just wanted to let you know that it clearly looks like there is a malicious packet injector in place on route in and out of where your server is hosted.

You have mentioned that it works for a .CN website but not a .COM one, hosted on the same server (correct?). Have you tested what happens if you completely stop the webserver so nothing is listening on port 443, and then telnet into it on port 443 from outside China? If you get what looks like an open port, then there is an inline device acting as a man-in-the-middle which clearly demonstrates the existence of some kind of filtering/firewall appliance aka the "Great Firewall".

The connection doesn't get kicked off from `telnet` -- in other words, the RST responses -- client-side from the server's perspective; server-side from the client's perspective, as you had experienced -- appear right after the Client Hello, maybe specifically in response to anything following them. (Edited to add) -- I'm not sure it's feasible to stop the webserver altogether, but will check on that too. Thanks for the input! — lewis levenberg, Feb 08 '19 at 14:03
Since this has the guidance for teasing out further whether we had encountered a "Great Firewall" issue or something internal (it was the former, as far as we could tell, and it's now apparently resolved through no fault of our own), I think this should be the accepted answer, so I'm marking it as such. — lewis levenberg, Feb 12 '19 at 23:33