Anycast is best described as a "one-to-nearest" routing scheme, and typically works by having BGP (Border Gateway Protocol) announce destination IPs from multiple sources, resulting in the packets being routed to the nearest of the destination IPs listed.
So in the broad sense, anycast is just used to figure out which server to connect to, and
there's nothing about it that makes it unsuited to TCP, or stateful networking.
The primary use case for anycast is in CDNs (Content Delivery Networks), which generally have short-lived and/or stateless connections - as you'd expect when delivering lots of small, static webpage content. In this use-case, anycast's assumption that the network topology will remain the same for at least the length of the session is a fairly safe assumption given the short length of the typical session, and the minimal consequences of that assumption becoming false - worst case, the session fails in the middle, and the user reloads the webpage.
The drawback of using anycast for longer sessions, or for uses which are intolerant of disruptions is that the network topology is more likely to change during a longer timeframe, and the connection will silently break if that happens. (Pop-switching.) As you allude to in your question, Google (and others) are working on proprietary methods of solving this problem, but for now, it's all proprietary and secret.
So the answer to your question of how anycast works with TCP is really that it works just fine, unless the network topology changes... if the topology changes, it [potentially] breaks.
There's an interesting presentation here (warning, pdf) with real world data about the use of anycast, including some long-lived sessions, and it would seem that in the real world, "pop switching" (where the network topology changes in the middle of a session and breaks a connection) is a very uncommon experience - in one dataset, with 683,204 sessions, and 23,795 sessions longer than 10 minutes, only 4 sessions got pop switched.