I'm hoping someone can provide some insight to a weird issue we're having as of 3 February 2019.
TL;DR
- HTTPS sites on an IIS server in China are returning TCP RST packets after the initial TLS handshake.
- The sites are showing "connection reset" errors to clients outside of China. * The same sites are accessible from within China over HTTPS.
- Proxying the connection with CloudFlare, for DNS and to terminate SSL, reverses this issue (accessible only from outside China, connection reset from within).
- a .CN domain on the same server can serve HTTPS outside of China, using the wildcard cert for the .COM domain (after accepting the invalid cert).
Background
We have a Windows web server (2008R2, IIS7.5) in AliYun Cloud (like a Chinese AWS, so this box is a VM like an EC2 instance). The IIS server hosts several sites on subdomains for which we have a wildcard certificate (e.g. https://app.example.com/ and https://api.example.com, and our wildcard is for *.example.com ). We recently updated that certificate, by installing a fully chained PFX file as we normally do. Testing the sites immediately afterward, everything was normal, and the HTTPS sites were served with the new certificate.
Shortly after this (like, a day later), HTTPS stopped working as expected. Clients connecting from outside of China would receive an error after the TLS handshake, indicating that the server had reset the connection. The same sites would load perfectly normally from the server itself, or any other location within China from which we tested. Any location outside of China from which we tested received the same connection-reset errors.
Troubleshooting and Testing
Rolling back the wildcard certificate to the previous one (even though it was soon to expire) did not affect the issue at all. Additionally, the renewed certificate was recognized as valid by the clients in China, our TLS version and ciphers showed up as OK in the browsers, etc. IIS and SChannel on the server reported no issues -- in fact, the failed connections did not even show up in the IIS logs.
We double-checked the bindings in IIS (all correct and using the updated cert), Windows firewall settings (not enabled), certificate properties (fully chained and including friendly name, correct SAN, etc). We combed through our TLS settings for version and ciphers, e.g. with IISCrypto and registry edits, and all were up to date, as far as .NET4.0 can support.
None of those settings affected the symptom of being able to connect to HTTPS sites on the server from within, but not outside of, China.
Research
I ran most additional tests from a computer in New York.
- with
telnet
, we are able to connect to the server on port 443, ruling out a straightforward network firewall rule based on TCP port. - with
traceroute
, we see timeouts once the request gets into China, but nothing crazy -- and as usual, plain text HTTP works normally from anywhere. - with
nslookup
, the resolution and name servers are right where they should be tcpdump -vv -i any host x.x.x.x and port 443
gave a reasonably interesting packet capture: it shows the RST packets showing up after the TLS handshake / Client Hello, in lieu of cipher negotiation or any payload: screenshot from Wireshark view of pcap file obtained through tcpdump, with server IP removed- (edited to add:) Packet capture on the server shows similar patterns: RST packets received -- ostensibly from the client -- immediately after the TLS handshake and Client Hello.
When I enabled CloudFlare on the domain to proxy DNS and terminate SSL (i.e. to have the origin server in China serve to CloudFlare via plain HTTP on port 80, but use CloudFlare's shared SSL to the clients (aka "Flexible" SSL in CloudFlare's plan)), the symptoms were reversed -- only clients external to China would see the HTTPS sites, while clients within China, including the local server, would see a connection reset at the HTTPS URL.
We have a .CN domain pointed to the same IIS server as the example.com subdomains. When visiting via that domain -- e.g. https://example.cn/ -- the connection loads as expected (you must accept that the SSL cert in use is the wildcard for *.example.com, and then you can load the site with a warning). The RST packets also don't appear in packet captures. For the record, the .CN domain gives nearly identical results in nslookup
, traceroute
, etc.
Concluding Questions
To me, it looks like the so-called "Great Firewall" is at work, i.e. forging and injecting RST packets to this connection. The RST packets don't follow the exact same patterns described in Weaver et al or Clayton et al, but they are pretty close in each case. Does this make sense? If so, is there any other test we could do to conclusively show that this is the case? (edited for question clarity)
I don't have access to the cloud "dashboard" for hosting for this machine, but a colleague is checking on that, in case there's some network-level issue we could address that way. Anything we should check for, in particular, there?
We do have an ICP number that can be applied if needed, for our .CN domain. (edited for clarity)
Obviously we'd like to be able to serve our sites at their .COM domains, to visitors both within and outside China, over HTTPS, from our existing server, using our wildcard cert, as we did before. What should we do?