Can't reach cluster virtual IP from pods, but can from worker nodes

Question

I'm having trouble with pods not being able to 'talk' to cluster IPs (virtual IPs fronting pods) in my Kubernetes cluster.

I've been following along with "Kubernetes the hard way" by Kelsey Hightower, however I've converted it all to run the infrastructure in AWS.

I have pretty much everything working, except I have a problem where my pods are unable to talk to clusterIP virtual IPs.

service-cluster-ip-range is: 10.32.0.0/24
Pod CIDR for worker nodes is: 10.200.0.0/16

I've tried with both CoreDNS and Kube-dns initially, thinking it might have been an issue at that level, however I've since diagnosed down to the fact that I cannot talk to service cluster IPs from pods, but on the actual worker nodes I can indeed talk to cluster IPs.

I've verified that kube-proxy is working as expected. I'm running that in iptables mode and can see it writing out iptables rules on worker nodes correctly. I even tried switching to ipvs mode and in that mode it also wrote out rules correctly.

If I do nslookup inside a test pod (e.g. busybox 1.28) and let it use it's standard nameserver setting pointing to my coredns installation, it fails to resolve google.com or the clusterkubernetes.default`. However if I tell nslookup to use the POD IP address of the coredns pod, it works just fine.

Example

This does not work:

kubectl exec -it busybox -- nslookup google.com               
Server:    10.32.0.10
Address 1: 10.32.0.10

nslookup: can't resolve 'google.com'
command terminated with exit code 1

This works (pointing nslookup to coredns pod IP address rather that cluster IP):

kubectl exec -it busybox -- nslookup google.com 10.200.2.2                   
Server:    10.200.2.2
Address 1: 10.200.2.2 kube-dns-67d45fcb87-2h2dz

Name:      google.com
Address 1: 2607:f8b0:4004:810::200e iad23s63-in-x0e.1e100.net
Address 2: 172.217.164.142 iad30s24-in-f14.1e100.net

To clarify, I've tried this with both CoreDNS and kube-dns - same result in both cases. It seems like a higher up networking issue.

My AWS EC2 instances have source/destination checking disabled. All my configuration and settings are forked from the official kubernetes-the-hard-way repo, but I've updated things to run on AWS. Source with all my config / settings etc is here

Edit: providing the /etc/resolv.conf that my pods are getting from kube-dns / coredns for info (this looks absolutely fine though):

# cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local ec2.internal
nameserver 10.32.0.10
options ndots:5

I am able to ping the kube-dns pod IP directly from pods, but the cluster IP for kube-dns does not work for ping or anything else. (same for other services with cluster IPs). E.g.

me@mine ~/Git/kubernetes-the-hard-way/test kubectl get pods -n kube-system -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP            NODE             NOMINATED NODE   READINESS GATES
hello-node1-55cc74b4b8-2hh4w        1/1     Running   2          3d1h   10.200.2.14   ip-10-240-0-22   <none>           <none>
hello-node2-66b5494599-cw8hx        1/1     Running   2          3d1h   10.200.2.12   ip-10-240-0-22   <none>           <none>
kube-dns-67d45fcb87-2h2dz           3/3     Running   6          3d1h   10.200.2.11   ip-10-240-0-22   <none>           <none>

 me@mine ~/Git/kubernetes-the-hard-way/test kubectl exec -it hello-node1-55cc74b4b8-2hh4w sh
Error from server (NotFound): pods "hello-node1-55cc74b4b8-2hh4w" not found
 me@mine ~/Git/kubernetes-the-hard-way/test kubectl -n kube-system exec -it hello-node1-55cc74b4b8-2hh4w sh
# ping 10.200.2.11
PING 10.200.2.11 (10.200.2.11) 56(84) bytes of data.
64 bytes from 10.200.2.11: icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from 10.200.2.11: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 10.200.2.11: icmp_seq=3 ttl=64 time=0.045 ms
^C
--- 10.200.2.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.044/0.056/0.080/0.017 ms

# ip route get 10.32.0.10
10.32.0.10 via 10.200.2.1 dev eth0  src 10.200.2.14
    cache
#

Am I missing something obvious here?

Added the content of /etc/resolv.conf to my original question in the edit. (It looks fine). The point I have is that the 10.32.0.10 cluster service vip is not reachable from my pods, but is from worker nodes. If it was reachable, then this /etc/resolv.conf would be working for my pods. If I manually change the 10.32.0.10 IP in the resolv.conf to point at the kube-dns or coredns pod IP address then everything works as expected. — Shogan, Nov 04 '19 at 09:26
And what about `ip route get 10.32.0.10` run in your pod ? Are you able to ping it ? — mario, Nov 06 '19 at 12:43
@mario I have added the results of `ip route get 10.32.0.10` to my original question at the bottom (as well as ping results). As expected ping to the services' cluster IP does not work. However direct to the pod IP works fine. — Shogan, Nov 06 '19 at 23:05
@Shogan, did you manage to relolve your issue ? To which `pod`/`service` belongs the ip `10.32.0.10` ? I see that your `kube-dns` cluster IP is `10.200.2.11` and it should be reachable from any pods deployed on your cluster. — mario, Nov 25 '19 at 19:03
@mario, no I didn't unfortunately. `10.32.0.10` is the cluster IP of kube DNS, and `10.200.2.11` is one of the pods direct IP addresses. So talking DNS straight to the pod over the container overlay network works fine, but using the cluster IP network that services get assigned IP addresses from does don't work. — Shogan, Nov 25 '19 at 23:08
And are you able to reach it from worker node ? As it is IP of the `Service` it shouldn't be reachable (using `ping`) from the node either. Being nothing more than set of iptables rules, accepting the traffic of a certain protocol type (e.g. TCP or UDP) destined to a specific port and redirecting it appropriately, `Service` will not respond to ping, so it is totally proper `Service`'s behaviour. As it is DNS Service it will accept traffic destined to port 53. You can check it using telnet and it should work. `telnet 10.32.0.10 53` should tell you if it is reachable or not. — mario, Nov 27 '19 at 13:09
yes, sorry the ping test example was a bad one - I've already tried it specifically for dns lookups on udp 53. Including nslookup and dig targeting that service IP. No luck. Definitely not reachable. However, direct from a worker node is another story - it works from worker node - i.e. I can do `dig something.xyz @10.32.0.10` and the coredns pod will respond to that DNS lookup request, but only if I do it on the worker host/node itself. The same command from inside containers that are running on that same host fail/timeout. — Shogan, Nov 29 '19 at 19:46
@Shogan, could you check one more thing ? What does `kubectl get ep kube-dns --namespace=kube-system` show ? (of course if you still didn't find the resolution for this issue) — mario, Jan 29 '20 at 13:42
Did you solved your problem? I have exactly the same issue and I'm not able to figure out what is wrong. — JackTheKnife, Aug 28 '20 at 16:07
@JackTheKnife unfortunately no, I didn't. I got distracted with other projects and moved on from this. If I ever boot up a new environment again I'll take another crack at this and update here if I find anything. — Shogan, Aug 28 '20 at 21:10
@WytrzymałyWiktor sorry, but I haven't. Priorities changed and I've not been able to dig into this again. It was a bit of a personal project so there was never impetus to complete it and find the issue. If you do happen to come across the solution please post back here! — Shogan, Feb 24 '21 at 16:46
Could you show the output of `kubectl get ep kube-dns --namespace=kube-system` as suggested by @mario? — Wytrzymały Wiktor, Feb 26 '21 at 09:12

score 0 · Answer 1 · answered Nov 03 '19 at 22:38

0

Try to add the following to kube-dns ConfigMap

data:
  upstreamNameservers: |
    [“8.8.8.8”, “8.8.4.4”]

answered Nov 03 '19 at 22:38

fg78nc

136
3

It's not an upstream DNS issue- as per my question, DNS works just fine if I point anything directly at my coredns or kubedns pod IP address. In other words coredns/kubedns are able to do DNS resolution themselves just fine. – Shogan Nov 04 '19 at 08:29

Can't reach cluster virtual IP from pods, but can from worker nodes

1 Answers1