1

I'm having some issues getting my AKS pods/containers connected to our on-prem network.

I have a virtual network in the 172.16.20.0/22 and 172.16.24.0/29 namespaces. They have 2 subnets, each has one of the above ranges as their subnet range.

The AKS cluster is bound to the 172.16.20.0/22 subnet, and each of the nodes as well as the pods are getting an IP address in that range. I Also added a regular VM to this subnet for temporary debugging.

In the 172.16.24.0/29 subnet, we have a Virtual Network Gateway (it has no IP in this subnet) which connects that subnet to our on-prem network. The VN Gateway has a matching local network gateway with address space 172.17.151.0/24. In our local network we have an SMTP server on 172.17.151.254, listening on port 25.

On the VM I spun up for debugging, I can connect to the SMTP server just fine. I can also ping the VM from the SMTP server without problems. From the pods however, I cannot connect to SMTP (tested with netcat -zv 172.17.151.254 25), neither can I ping a pod's IP address from the SMTP server.

Neither the subnets have an network security group (NSG) attached, so it can't be a misconfigured NSG rule. What else could be causing the connection to fail? The pods get the same basic network configuration from the DHCP serverin the subnet:

  • A 172.16.20.0/22 ip address
  • 172.16.20.1 as their default gateway

Out IT staff which maintains the on-prem device which is connecting to the Azure VNG helped me debug, they say that when initiating an SMTP connection to 172.17.151.254 they see the packet arriving, and a response package from the server going back into the VPN tunnel, so it seems the response packet is getting dropped somewhere in Azure.
Edit: During a further debug session with our IT staff, we noticed that the source IP of the packets coming from our misbehaving pod, is 172.17.20.5, instead of 172.16.20.21. 172.17.20.5 is the IP of the VMSS node the pod is running on, so that could make sense, but this would mean that the internal routing on that node isn't configured correctly.

Or is this something specific to kubernetes that is causing this to fail?

What I've tried so far:

  • On VM: ping to 172.16.20.21 (pod): works fine
  • On VM: ping to 172.17.151.254: works fine
  • On VM: tracert 172.17.151.254 succeeds in 1 hop (shouldn' this be at least showing 2 hops as it passes through the default gateway?)
  • On pod: ping to 172.16.20.4 (vm): works fine
  • On pod: ping to 172.17.151.254: fails
  • On pod: traceroute 172.17.151.254 fails with no hops showing
  • On on-prem VPN device: ping to 172.16.20.4 (vm): works fine
  • On on-prem VPN device: ping to 172.16.20.21 (pod): fails

Extra info:

ifconfig -a from pod:

eth0: flags=67<UP,BROADCAST,RUNNING>  mtu 1500
        inet 172.16.20.21  netmask 255.255.252.0  broadcast 0.0.0.0
        ether de:c7:74:e3:c5:24  txqueuelen 1000  (Ethernet)
        RX packets 386868  bytes 35746728 (34.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 511891  bytes 43865660 (41.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 5  bytes 504 (504.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5  bytes 504 (504.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

route output from pod:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         172.16.20.1     0.0.0.0         UG    0      0        0 eth0
172.16.20.0     0.0.0.0         255.255.252.0   U     0      0        0 eth0

ipconfig /all from debug VM:

Windows IP Configuration

   Host Name . . . . . . . . . . . . : debug-vm
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : nedz0ha4spbubmi5cnxgsnswdh.ax.internal.cloudapp.net

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : nedz0ha4spbubmi5cnxgsnswdh.ax.internal.cloudapp.net
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
   Physical Address. . . . . . . . . : 00-0D-3A-2D-DC-BA
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::e9bb:fede:66cc:398c%6(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.16.20.4(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.252.0
   Lease Obtained. . . . . . . . . . : Friday, August 28, 2020 7:15:08 AM
   Lease Expires . . . . . . . . . . : Friday, October 8, 2156 1:20:49 PM
   Default Gateway . . . . . . . . . : 172.16.20.1
   DHCP Server . . . . . . . . . . . : 168.63.129.16
   DHCPv6 IAID . . . . . . . . . . . : 100666682
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-26-DA-67-54-00-0D-3A-2D-DC-BA
   DNS Servers . . . . . . . . . . . : 168.63.129.16
   NetBIOS over Tcpip. . . . . . . . : Enabled

route print from debug vm:

===========================================================================
Interface List
  6...00 0d 3a 2d dc ba ......Microsoft Hyper-V Network Adapter
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      172.16.20.1      172.16.20.4     10
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
    168.63.129.16  255.255.255.255      172.16.20.1      172.16.20.4     11
  169.254.169.254  255.255.255.255      172.16.20.1      172.16.20.4     11
      172.16.20.0    255.255.252.0         On-link       172.16.20.4    266
      172.16.20.4  255.255.255.255         On-link       172.16.20.4    266
    172.16.23.255  255.255.255.255         On-link       172.16.20.4    266
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link       172.16.20.4    266
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link       172.16.20.4    266
===========================================================================
Persistent Routes:
  None

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    331 ::1/128                  On-link
  6    266 fe80::/64                On-link
  6    266 fe80::e9bb:fede:66cc:398c/128
                                    On-link
  1    331 ff00::/8                 On-link
  6    266 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None
Alex
  • 261
  • 1
  • 3
  • 8
  • What type of sub are you using (Pay as you go, Enterprise agreement etc?). Azure does restrict outbound traffic on port 25 as per this article - https://docs.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity this is mainly about sending data outbound and I would hope that it does not do that for VPN traffic, but I wonder if it is affecting your traffic – Sam Cogan Sep 04 '20 at 06:59
  • Thanks for the suggestion! We have a subscription through a Cloud Service Provider (CSP). But as far as I can read in that article, the restriction is only in place for connections to port 25 directly to external domains (such as gmail.com). That is not the case here, we're connecting to port 25 on an internal IP. Also as I pointed out my debug VM which sits on the same virtual network subnet as the Kubernetes pods, can connect to the SMTP server and send emails just fine... – Alex Sep 04 '20 at 10:41

1 Answers1

0

The problem was found after extensive troubleshooting with the help of Microsoft support.

The root cause was the IP address of the SMTP server (VPN endpoint) on 172.17.151.254, this overlaps with the default docker bridge network of 172.17.0.0/16 which was configured on the K8S nodes. As this aspect was not present on the debug VM I started, the problem didn't manifest itself there.

Lesson learned: Steer clear from the 172.17.0.0/16 range when using AKS

Alex
  • 261
  • 1
  • 3
  • 8