Inter-pod communication failure between Kubernetes nodes : Azure virtual machine and on-prem node

Question

Rancher Server Setup

Rancher version: 2.6.3
Installation option (Docker install/Helm Chart): Helm Chart, Kubernetes v1.21.6 and RKE1

Information about the Cluster Kubernetes version: v1.20.15-rancher1-2 Cluster Type (Local/Downstream): Downstream If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): RKE Custom (3 nodes on-prem + 1 node on Azure)

User Information What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin role

Describe the bug
To illustrate the inter-pod communication problem, consider these three dcgm-exporter pods that collect and expose GPU metrics :

URL1- http://10.42.0.79:9400/metrics -> Pod 10.42.4.54 running on node-1-on-prem
URL2- http://10.42.2.77:9400/metrics -> Pod 10.42.2.77 running on node-2-on-prem
URL3- http://10.42.4.54:9400/metrics -> Pod 10.42.4.54 running on node-3-azure
On node-1-on-prem Linux shell : curl URL1 & URL2 are successful; curl URL3 fails
On node-2-on-prem Linux shell : curl URL1 & URL2 are successful; curl URL3 fails
On node-3-azure Linux shell : curl URL1 & URL2 fail ; curl URL3 is successful

Reproduce

On-prem subnet is 10.133.100.0/24 and Azure subnet is 10.208.2.0/24
Azure Virtual network and Local network are connected by a site to site VPN
Node to node connections are successful and there are no port restrictions in Azure and on-prem
IPv4 port forwarding enabled on all nodes
Downstream cluster container network interface configuration : network: mtu: 0 options: flannel_backend_type: vxlan plugin: canal
Azure node addition to cluster is flawless and all pods come up

Result

On node-1-on-prem Linux shell : $curl http://10.42.4.54:9400/metrics curl: (28) Failed to connect to 10.42.4.54 port 9400: Connection timed out

Expected Result

Successful inter-pod communication and display of GPU metrics

How to get these pods to communicate properly? Thanks in advance for your support.

Inter-pod communication failure between Kubernetes nodes : Azure virtual machine and on-prem node

0 Answers0