I'm trying to stand up a pair of kubernetes workers on EC2 instances, and running into a problem where the service does not appear to "see" all of the pods that it should be able to see.

My exact environment is a pair of AWS Snowballs, Red and Blue, and my cluster looks like control, worker-red, and worker-blue [1]. I'm deploying a dummy python server that waits for a GET on port 8080, and replies with the local hostname. I've set it up with enough replicas that both worker-red and worker-blue have at least one pod each. Finally, I've created a service, the spec of which looks like

    type: NodePort
        app: hello-server
        - port: 8080
          targetPort: 8080
          nodePort: 30080

I can now check that my pods are up

kubectl get pods -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
hello-world-deployment-587468bdb7-hf4dq   1/1     Running   0          27m   worker.red    <none>           <none>
hello-world-deployment-587468bdb7-mclhm   1/1     Running   0          27m   worker.blue   <none>           <none>

Now I can try to curl them

curl worker-red:30080
greetings from hello-world-deployment-587468bdb7-hf4dq
curl worker-blue:30080
greetings from hello-world-deployment-587468bdb7-mclhm

That's what happens about half the time. The other half of the time, the curl fails with a timeout error. Specifically - curling worker-red will ONLY yield a response from hf4dq, and curling worker-blue will ONLY yield a response from mclhm. If I cordon and drain worker-blue so both of my pods are running on worker-red, there is never a timeout, and both pods will respond.

It seems like the NodePort service is not reaching pods that are not on the host I am curling. As I understand them, this isn't how services are supposed to work. What am I missing?

[1] If I set up such that I have two workers both on Red, the same problem I'm describing happens, but this is my primary use case so it's the one I'll concentrate on.

It is hard to simply say what might be wrong here but there are some steps you can take in order to troubleshoot your issue:

  1. Debug Pods, especially check if there is something suspicious in the logs:
  • kubectl logs ${POD_NAME} ${CONTAINER_NAME}

  • kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}

  1. Debug Services, for example by checking:
  • Does the Service exist?

  • Does the Service work by DNS name?

  • Does the Service work by IP?

  • Is the Service defined correctly?

  • Does the Service have any Endpoints?

  • Is the kube-proxy working?

Going through those steps will help you find the cause of your issue and also better understand the mechanics behind the services.


You are using NodePort type service, in which case what you're observing is very much expected.

Your service is matching 2 pods that are running on two different nodes. Since the service is of type NodePort, there is an inherent association of a pod of your service and the node it is running on. If you curl the worker-red endpoint, you will ONLY get the response from the worker-red pod, that is because the other pod is tied to another endpoint worker-blue:<node-port> and is not reachable from the worker-red endpoint. Yes, it is the same service, but it is backed by 2 endpoints, each having different hostnames.

That is basically how NodePort services work.

When you bundle them both on the same node, both pods are accessible from the same node hostname, so curling them both will work. Since now, both endpoints map to different ports, but the same hostname.

As a way to further your understanding of this. You can try to change your service type to LoadBalancer. And you'll notice you will be able to reach both pods using the same hostname, regardless of where they are being scheduled. And this hostname/IP-Address will be the address of the LoadBalancer that all the pods in the service will have in common.

I hope this clarifies your confusion!

