0

I have a GKE cluster running on 1.11.2-gke.15 and my pods are unable to talk to each other.

It seems DNS resolution is working from inside the containers

# nslookup myapp.testns.svc.cluster.local
Server:     10.7.5.10
Address:    10.7.5.10#53

Non-authoritative answer:
Name:   myapp.testns.svc.cluster.local
Address: 10.7.13.156

However when I try actually hitting the service is does not seem to work

# telnet myapp.testns.svc.cluster.local 8080
Trying 10.7.13.156...

It seems this might have started after I upgraded the cluster from 1.10 to 1.11.2

I have attempted to restart the nodes and all the pods but no go.

Am I missing something obvious?

UPDATE 1:

I figured out that one of the nodes in the cluster, which was created by the node autoscaler, was not reachable. All pods in it could not be reached by pods in other nodes.

The solution was to scale the cluster down by hand and letting the autoscaler scale it up again and the new node was now reachable. I am uncertain why this happened or how to prevent it in the future so suggestions are welcome

ByteFlinger
  • 193
  • 1
  • 7

1 Answers1

0

Looks like it might be an issue with version 1.11.2-gke.15, it was reported as a private issue. A new revision of 1.11.2 is rolling out (gke.18) which addresses this problem.

Patrick W
  • 582
  • 2
  • 8