10

TCP, being stateful, should require subsequent packets to reach the same server. (Stateless) HTTP runs on top of TCP, and CDN's can use anycast.

So how does TCP work with anycast? What if the syn and the ack go to different servers? I think I've heard Google has some solution to this, but I'm not sure.

Please answer for both IPv4 and IPv6, if there's any difference.

Filip Haglund
  • 361
  • 4
  • 11
  • Surely you can find these answers elsewhere already. [Anycast](http://en.wikipedia.org/wiki/Anycast). –  Jul 29 '14 at 19:48
  • 1
    The concept of Anycast and the implementation details vary slightly, as you would expect. – Chris S Jul 29 '14 at 19:56
  • https://serverfault.com/q/648265/87017 – Pacerier Nov 03 '17 at 22:29
  • There's a [linkedin engineering blogpost](https://engineering.linkedin.com/network-performance/tcp-over-ip-anycast-pipe-dream-or-reality) that talks about it but they're noobs and their conclusion is faulty as they didn't do **longitudinal tests**. Also, within their short tests, they didn't induce a network repathing so the "test" is complete junk. In fact, they didn't even bother disconnecting one of their tested POPs so the "test" is *not even* complete junk. – Pacerier Nov 03 '17 at 22:33

3 Answers3

9

This is one of those many challenges, which can be approached in many different ways. The simplest approach is to ignore it and hope for the best. As long as routing doesn't change mid-connection, it will be fine. But when routing does change, it will break all those connections affected by the routing change. The other answers already go into more depth with this approach.

Another approach is to track where connections are routed to. If a packet gets routed to the wrong POP, the CDN can tunnel the packet to the right POP for further processing. This does introduce additional overhead, the client will experience increased latency once it happens. This increased latency will persist for the lifetime of the connection. But it is likely better for the user experience than a broken connection.

In terms of bandwidth consumption, the overhead is not very significant, because it affects only packets in one direction, and that tends to be the direction with the smallest bandwidth usage.

The tracking could be done at connection level or by tracking which is the preferred POP to be serving each individual client IP address. The most obvious data structure for tracking the connections would be a distributed hash table.

If the client supports MPTCP, there is another solution, which could be used. As soon as the connection has been established, the server will open another subflow using a unicast IP address. If such a subflow is successfully established, then the connection can survive change of routing of the anycast address by simply using the unicast address for the remaining lifetime of the connection.

In principle all of the above approaches would be the same for IPv4 and IPv6. But in practice some solutions may not work as well on IPv4 due to shortage of IP addresses.

For example the MPTCP approach does require each server to have a public IP address in order to work well. A large load balancing setup might have too many servers to assign a public IP address to each. Additionally establishing the new subflow cannot be initiated by the server, if the client is behind a NAT, which is often the case with IPv4. That means the server would instead have to send the unicast IP address as an option over the initial subflow and let the client initiate the extra subflow.

I don't know which of the above approaches have been used by CDNs.

kasperd
  • 29,894
  • 16
  • 72
  • 122
  • 1
    On wikipedia it mentions "To correct this issue, there have been proprietary advancements within custom IP stacks which allow for healing of stateful protocols where it is required.". Do you know of any open source solutions that can heal changing stateful protocols like TCP anycast? – CMCDragonkai May 20 '15 at 07:28
  • 1
    @CMCDragonkai The wording you cite is misleading. It gives the impression there is some state, which can be healed through some process. But the issue is that at some point the packets are delivered to a destination which has none of the state to handle them. You could conceivably fake some state at the TCP layer, but state at all layers above is missing as well which would include SSL. So this mythical healing would include a reliable SSL MITM-attack which works without even seeing the key exchange. – kasperd May 20 '15 at 07:55
  • @CMCDragonkai You can't reconstruct the state out of nothing. So you need to get the state and the packet to be processed to the same place. Live replication of state across locations is slow, complicated, and to a large extent defeats the load balancing. Migration of state once routing has changed is slow, complicated, and prone to inconsistencies when nodes cannot agree on which node's state is the most up-to-date version. When the state cannot move to packet, the packet needs to move to the state. That's what the first half of my answer describes. – kasperd May 20 '15 at 08:01
  • That makes sense, so healing of TCP is a bit of a pipedream then. Anycast then should really only be used for stateless connections, or at very least very short sessions in stateful connections. – CMCDragonkai May 20 '15 at 08:04
  • 1
    @CMCDragonkai I would indeed consider healing of TCP as implied by the cited article to be a pipe dream. But that doesn't mean you should completely write off use of anycast for stateful communication. My point is that running TCP on an anycast address works most of the time, and you can correct those cases where it does not work by tunneling the misrouted packets to the proper destination. I would expect the benefit from anycast to far outweigh the additional cost of the occasional rerouting of packets. I know exactly how such a solution would work, but I don't know any implementation of it. – kasperd May 20 '15 at 08:09
  • 1
    Is there any resources that I can use to find out more about rerouting the misdirected packets? – CMCDragonkai May 20 '15 at 08:10
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/23942/discussion-between-kasperd-and-cmcdragonkai). – kasperd May 20 '15 at 08:17
  • @kasperd, Re "approach is to track", Are you sure "track" is the right word? Or do you mean "determine from the packet itself"? How would you go about tracking an offrouted packet? ||| Re "by tracking which is the preferred POP to be serving each individual client IP address", **doesn't this override the whole point of anycasting in the first place**? – Pacerier Nov 03 '17 at 21:57
  • @Pacerier Yes I am sure track is the right word. The idea is to route TCP SYN packets to the closest location and keep track of which location was chosen for each connection such that the rest of the packets in the connection can be routed to the same location. As for the other idea of choosing a location per IP range, that means routing of packets can be done based on much more static information than connection tracking. As long as you choose the location appropriately it will still give you the latency benefit of anycast. – kasperd Nov 04 '17 at 00:57
5

Anycast is best described as a "one-to-nearest" routing scheme, and typically works by having BGP (Border Gateway Protocol) announce destination IPs from multiple sources, resulting in the packets being routed to the nearest of the destination IPs listed.

So in the broad sense, anycast is just used to figure out which server to connect to, and there's nothing about it that makes it unsuited to TCP, or stateful networking.

The primary use case for anycast is in CDNs (Content Delivery Networks), which generally have short-lived and/or stateless connections - as you'd expect when delivering lots of small, static webpage content. In this use-case, anycast's assumption that the network topology will remain the same for at least the length of the session is a fairly safe assumption given the short length of the typical session, and the minimal consequences of that assumption becoming false - worst case, the session fails in the middle, and the user reloads the webpage.

The drawback of using anycast for longer sessions, or for uses which are intolerant of disruptions is that the network topology is more likely to change during a longer timeframe, and the connection will silently break if that happens. (Pop-switching.) As you allude to in your question, Google (and others) are working on proprietary methods of solving this problem, but for now, it's all proprietary and secret.

So the answer to your question of how anycast works with TCP is really that it works just fine, unless the network topology changes... if the topology changes, it [potentially] breaks.

There's an interesting presentation here (warning, pdf) with real world data about the use of anycast, including some long-lived sessions, and it would seem that in the real world, "pop switching" (where the network topology changes in the middle of a session and breaks a connection) is a very uncommon experience - in one dataset, with 683,204 sessions, and 23,795 sessions longer than 10 minutes, only 4 sessions got pop switched.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
  • 4
    I disagree specifcally with the sentence "So in the broad sense, anycast is just used to figure out which server to connect to, and there's nothing about it that makes it unsuited to TCP, or stateful networking.". It's not actually used to figure out which server to connect to, it's used to have each packet individually delivered to the closest of several locations. This difference is what makes it theoretically unsuited to stateful protocols but, as you point out, can work well in practice provided the topology (ie, what is closest) rarely changes compared to typical connection lifetime. – Håkan Lindqvist Jul 29 '14 at 21:51
  • 2
    @HåkanLindqvist You're right, my phrasing on that is atrocious. Let me think about how best to reword it. – HopelessN00b Jul 30 '14 at 00:18
  • Is it still a secret? Why would they keep it a secret? Surely the internet works because things are public? – Pacerier Nov 03 '17 at 22:13
  • "4 sessions got pop switched"... when things are working fine. In realworld **longitudinal** tests its logical to assume that there are periods where where 100% of connections break for a short while. – Pacerier Nov 03 '17 at 22:15
  • @HopelessN00b, Why doesn't IPv6 anycast solve the popswitch problem? – Pacerier Nov 03 '17 at 22:16
3

It works better than you expect, especially for TCP sessions that are usually pretty short-lived such as those generated by HTTP clients.

Anycast assumes that the network topology isn't going to change for the duration of the session, and if it does change it isn't likely that another endpoint will suddenly be nearer than the one that negotiated the session. The application protocol should handle this sort of disconnect/reconnect activity.

CDNs work very well on Anycast since their whole business model is short-lived TCP sessions with significantly unidirectional network transfer out of their network. If the ACK stream ends up going somewhere other than the endpoint it originally negotiated for, the connection will hang for that one asset.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296