0

I am helping implement CloudFront CDN for an NGINX HLS video origin. If you're not familiar, HLS in the browser just uses XHR or fetch to constantly request .m3u8 and .ts files via HTTP and display them in a video element. I have replicated the issue I'm describing with simple AJAX calls on an interval, so the problem is not specific to HLS. I would like to be able to switch traffic between the CDN and direct-to-origin with minimal impact to users. I have built this out, and can switch between CloudFront and direct-to-origin by changing DNS in Route 53. The DNS record has a TTL of 1 minute

However, when I do so, sometimes the IP address used by the browser does not change - even long after the DNS TTL. OS and browser level DNS cache show the expected IP address, but the browser (as shown in Developer tools -> Network) shows it is still using the "old" IP address. It can keep doing this for several hours after the DNS TTL. Even refreshing the page will not force it to get a new IP for the domain. So far, I've only found chrome://net-internals/#sockets -> Flush Socket Pools or completely closing all browser instances forces the browser to get a new IP address for the domain.

So, I'm fairly certain that the issue is that Chrome (also tested FireFox, probably all browsers), maintain a connection and do not look up DNS again until the connection is closed, regardless of the DNS TTL, especially with something like HLS video or a continuous ajax polling where the connection is being used every few seconds. I am able to control this somewhat by setting Connection:close or Keep-Alive:timeout=5s headers on the origin. However, I cannot control these at CloudFront, even with a custom function. Moreover, if I enable HTTP2 at origin and/or CloudFront, these headers are not allowed or used, but I still see similar behavior.

I can also return a HTTP 421 Misdirected Request from the origin and force clients hitting the origin to refresh. However, this does not work from CloudFront - using a CloudFront function to modify response code causes an error, and a 421 returned from origin to Cloudfront causes an error and does not trigger clients to refresh.

Given all this, how can I ensure that DNS changes take effect in the browser within the DNS entry's TTL? Is there any header or CloudFront setting I can use? I can control some of the clients, so is there is any javascript, request header, or XHR trick to force the browser to get and use the new TTL?

Daniel
  • 141
  • 1
  • 7
  • "do not look up DNS again until the connection is closed" Why should it? The DNS is just used to open the connection, once it is open and stays open, the DNS is useless/not needed. Implementing a keep alive that way would waste a lot of resources. If the connection is open and the browser is getting from it what it needs/has requested,why should it bother consult the DNS again and potentially open a new connection if that current IP address is good enough(responding)? Your issue does not seem related really to the DNS or browsers even, just more around the control you have or not on the server – Patrick Mevzek Oct 21 '21 at 23:04
  • What is the actual use case for switching between origin server and CDN? What are you trying to achieve? – Tero Kilkanen Oct 22 '21 at 06:35
  • @Tero The use case is that we want to be able to switch viewers from getting video from CloudFront to getting it direct from on-prem origin and visa versa as quickly and seamlessly as possible. We want to only pay for CloudFront when we expect high load. When we no longer want CloudFront, we can change DNS so that new connections resolve back to the on-prem origin. This works. However, we find that some connections "stick" at CloudFront well past the DNS TTL, and have narrowed it to this persistent connection behavior. – Daniel Oct 22 '21 at 15:06
  • @PatrickMevzek Yes, DNS and the browser are behaving in a way that makes sense during normal use. I'm just looking for some way to trigger a new DNS lookup whenever we change DNS. The "why should it bother" is that we have to pay for CloudFront - we want that traffic to come back from CloudFront and go direct to origin as quickly as possible - at least within the TTL we set for DNS. As-is, traffic can "stick" at CloudFront well past DNS TTL - I measured 4+ hours in some tests. – Daniel Oct 22 '21 at 15:13
  • Your only option is likely to look for some other CDN where you can implement the `421` status code properly. Or ask AWS if they can change their implementation. Browsers work the way they do, and you cannot reallly change their behaviour. – Tero Kilkanen Oct 23 '21 at 09:19

0 Answers0