4

Summary: My problem is I cannot use the QNAP NFS Server as an NFS datastore from my ESX hosts despite the hosts being able to ping it. I'm utilising a vDS with LACP uplinks for all my network traffic (including NFS) and a subnet for each vmkernel adapter.

Setup: I'm evaluating vSphere and I've got two vSphere ESX 5.5 hosts (node1 and node2) and each one has 4x NICs. I've teamed them all up using LACP/802.3ad with my switch and then created a distributed switch between the two hosts with each host's LAG as the uplink. All my networking is going through the distributed switch, ideally, I want to take advantage of DRS and the redundancy. I have a domain controller VM ("Central") and vCenter VM ("vCenter") running on node1 (using node1's local datastore) with both hosts attached to the vCenter instance. Both hosts are in a vCenter datacenter and a cluster with HA and DRS currently disabled. I have a

QNAP TS-669 Pro (Version 4.0.3) (TS-x69 series is on VMware Storage HCL) which I want to use as the NFS server for my NFS datastore, it has 2x NICs teamed together using 802.3ad with my switch.

vmkernel.log: The error from the host's vmkernel.log is not very useful:

NFS: 157: Command: (mount) Server: (10.1.2.100) IP: (10.1.2.100) Path: (/VM) Label (datastoreNAS) Options: (None) cpu9:67402)StorageApdHandler: 698: APD Handle 509bc29f-13556457 Created with lock[StorageApd0x411121]
cpu10:67402)StorageApdHandler: 745: Freeing APD Handle [509bc29f-13556457]
cpu10:67402)StorageApdHandler: 808: APD Handle freed!
cpu10:67402)NFS: 168: NFS mount 10.1.2.100:/VM failed: Unable to connect to NFS server.

Network Setup: Here is my distributed switch setup (JPG). Here are my networks.

  • 10.1.1.0/24 VM Management (VLAN 11)
  • 10.1.2.0/24 Storage Network (NFS, VLAN 12)
  • 10.1.3.0/24 VM vMotion (VLAN 13)
  • 10.1.4.0/24 VM Fault Tolerance (VLAN 14)
  • 10.2.0.0/24 VM's Network (VLAN 20)

vSphere addresses

  • 10.1.1.1 node1 Management
  • 10.1.1.2 node2 Management
  • 10.1.2.1 node1 vmkernel (For NFS)
  • 10.1.2.2 node2 vmkernel (For NFS)
  • etc.

Other addresses

  • 10.1.2.100 QNAP TS-669 (NFS Server)
  • 10.2.0.1 Domain Controller (VM on node1)
  • 10.2.0.2 vCenter (VM on node1)

I'm using a Cisco SRW2024P Layer-2 switch (Jumboframes enabled) with the following setup:

  • LACP LAG1 for node1 (Ports 1 through 4) setup as VLAN trunk for VLANs 11-14,20
  • LACP LAG2 for my router (Ports 5 through 8) setup as VLAN trunk for VLANs 11-14,20
  • LACP LAG3 for node2 (Ports 9 through 12) setup as VLAN trunk for VLANs 11-14,20
  • LACP LAG4 for the QNAP (Ports 23 and 24) setup to accept untagged traffic into VLAN 12

Each subnet is routable to another, although, connections to the NFS server from vmk1 shouldn't need it. All other traffic (vSphere Web Client, RDP etc.) goes through this setup fine. I tested the QNAP NFS server beforehand using ESX host VMs atop of a VMware Workstation setup with a dedicated physical NIC and it had no problems.

The ACL on the NFS Server share is permissive and allows all subnet ranges full access to the share.

I can ping the QNAP from node1 vmk1, the adapter that should be used to NFS:

~ # vmkping -I vmk1 10.1.2.100
PING 10.1.2.100 (10.1.2.100): 56 data bytes
64 bytes from 10.1.2.100: icmp_seq=0 ttl=64 time=0.371 ms
64 bytes from 10.1.2.100: icmp_seq=1 ttl=64 time=0.161 ms
64 bytes from 10.1.2.100: icmp_seq=2 ttl=64 time=0.241 ms

Netcat does not throw an error:

~ # nc -z 10.1.2.100 2049
Connection to 10.1.2.100 2049 port [tcp/nfs] succeeded!

The routing table of node1:

~ # esxcfg-route -l
VMkernel Routes:
Network          Netmask          Gateway          Interface
10.1.1.0         255.255.255.0    Local Subnet     vmk0
10.1.2.0         255.255.255.0    Local Subnet     vmk1
10.1.3.0         255.255.255.0    Local Subnet     vmk2
10.1.4.0         255.255.255.0    Local Subnet     vmk3
default          0.0.0.0          10.1.1.254       vmk0

VM Kernel NIC info

~ # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type       
vmk0       133                 IPv4      10.1.1.1                                255.255.255.0   10.1.1.255      00:50:56:66:8e:5f 1500    65535     true    STATIC     
vmk0       133                 IPv6      fe80::250:56ff:fe66:8e5f                64                              00:50:56:66:8e:5f 1500    65535     true    STATIC, PREFERRED
vmk1       164                 IPv4      10.1.2.1                                255.255.255.0   10.1.2.255      00:50:56:68:f5:1f 1500    65535     true    STATIC     
vmk1       164                 IPv6      fe80::250:56ff:fe68:f51f                64                              00:50:56:68:f5:1f 1500    65535     true    STATIC, PREFERRED
vmk2       196                 IPv4      10.1.3.1                                255.255.255.0   10.1.3.255      00:50:56:66:18:95 1500    65535     true    STATIC     
vmk2       196                 IPv6      fe80::250:56ff:fe66:1895                64                              00:50:56:66:18:95 1500    65535     true    STATIC, PREFERRED
vmk3       228                 IPv4      10.1.4.1                                255.255.255.0   10.1.4.255      00:50:56:72:e6:ca 1500    65535     true    STATIC     
vmk3       228                 IPv6      fe80::250:56ff:fe72:e6ca                64                              00:50:56:72:e6:ca 1500    65535     true    STATIC, PREFERRED

Things I've tried/checked:

  • I'm not using DNS names to connect to the NFS server.
  • Checked MTU. Set to 9000 for vmk1, dvSwitch and Cisco switch and QNAP.
  • Moved QNAP onto VLAN 11 (VM Management, vmk0) and gave it an appropriate address, still had same issue. Changed back afterwards of course.
  • Tried initiating the connection of NAS datastore from vSphere Client (Connected to vCenter or directly to host), vSphere Web Client and the host's ESX Shell. All resulted in the same problem.
  • Tried a path name of "VM", "/VM" and "/share/VM" despite not even having a connection to server.
  • I plugged in a linux system (10.1.2.123) into a switch port configured for VLAN 12 and tried mounting the NFS share 10.1.2.100:/VM, it worked successfully and I had read-write access to it
  • I tried disabling the firewall on the ESX host esxcli network firewall set --enabled false

I'm out of ideas on what to try next. The things I'm doing differently from my VMware Workstation setup is the use of LACP with a physical switch and a virtual distributed switch between the two hosts. I'm guessing the vDS is probably the source of my troubles but I don't know how to fix this problem without eliminating it.

Gerald
  • 49
  • 1
  • 1
  • 6

7 Answers7

2

Hmm... vDS, NFS and LACP work great for me. However, it seems like you're jumping in pretty deep with a high-end set of vSphere features. Most installations don't really require LACP, but I can understand the appeal of trying to use it...

None of the vDS and other features matter if the QNAP isn't allowing the mount...

  • You've verified connectivity with vmkping, but should probably try it with the jumbo MTU: vmkping -s 9000 10.1.2.100 (no need to specify interface). Ensure that works.
  • I would disable the QNAP ACLs entirely for the moment.
  • Your mount path name should probably be ip.address:/share/VM/
  • Try to mount again, but pay attention to the messages in /var/log/vobd.log on the ESXi host. If it says something like "The mount request was denied by the NFS server.", the issue is the QNAP.
  • I'm sorry, but we're missing your physical switch type/model and configuration... Can you describe that? You should have trunked VLANs+LACP configs on the relevant ports.

Your screenshot of the vDS configuration looks like it's one host's worth of info. Verify that your config has LACP and the right load balancing modes set. It should look like the following:

enter image description here

enter image description here

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • It seems I was careless, I checked the MTU settings again along the path and I discovered the spare switch I'm using for my lab setting is only an L2 switch not a L3, meaning MTU of 9000 would have never worked, but even though it's not recommended, shouldn't leaving it on an MTU of 1500 still work with a L2 switch? – Gerald Jan 25 '14 at 15:42
  • I've checked /var/log/vobd.log, only relevant entries were the firewall rule set changes for the nfsClient. QNAP NFS share ACL has been set to the most permissive, any address, no limits. – Gerald Jan 25 '14 at 15:52
  • @Gerald Jumbo frames aren't exclusive to one switch type or another. L2 versus L3 may not matter. You still did not specify the type of switch in use and its config. – ewwhite Jan 25 '14 at 16:02
  • Sorry, was just checking load balancing changed it to match then did a ping (MTU 1500) which worked. It's a Cisco SRW2024P with 4x LACP LAGs (one for router with 4x ports, one for node1 with 4x ports, one for node2 with 4x ports and one for the QNAP with 2x ports). The router, node1 and node2 LAGs are configured as VLAN trunks with VLANs 11-14,20 configured. The LAG for the QNAP is just set to allow untagged traffic from the QNAP into VLAN 12. If this problem was caused by an LACP issue then why do protocol higher up the stack work across them? Shouldn't there be nothing working? – Gerald Jan 25 '14 at 16:10
  • Apparently the switch does support jumbo frames hidden in the depths of it's UI, I enabled it and set everything (vmk1, dvSwitch, QNAP) to have an MTU of 9000 and I managed to do `vmkping -s 9000 10.1.2.100` successfully but still can't mount NFS datastore, still no relevant log entries. – Gerald Jan 25 '14 at 16:27
  • @Gerald Now check your export paths. I don't think your switching is an issue now. – ewwhite Jan 25 '14 at 16:36
  • The NFS paths, I've tried `VM`, `/VM`, `/share/VM`. Going off my VMware Workstation with nested ESX setup, the path I used for the same QNAP for their NFS datastore mounts was `VM`. Are these the paths you're talking about? – Gerald Jan 25 '14 at 16:50
  • I plugged in a linux system (10.1.2.123) into a switch port configured for VLAN 12 and tried mounting the NFS share `10.1.2.100:/VM`, it worked successfully and I had read-write access to it. – Gerald Jan 25 '14 at 17:54
1

had same problem yesterday with a TS-420U and ESXi 5.5 U1. My Setup: - Two ESXi 5.5 with vCenter server - Direct Attached Storage - QNAP TS-420U NAS on same subnet with the ESXi hosts (so no routing problem) - All are on subnet 10.207.253.128/26

After configuring the NAS, I set the ACL to the appropriate subnet (10.207.253.*) and connected without problems. But after rebooting the ESXi hosts, no connection anymore, same errors like yours. NAS reboot and turning off/on NFS service didn't help. Last thing I tried was setting ACL on NAS server to * -> boom, it worked again. Both ESXi hosts can connect to the NFS share without problems.

Now I just have to find out, why the ESXi hosts can't connect with ACL set to the subnet...

0

I had a similar issue, where a ping was not possible from ESX SSH shell.

Solution:

-> esxcli network firewall ruleset list and check that the NFS server IP address is listed there.

(or do it from Host -> Configuration -> Security Policy -> Firewall -> NFS client)

  • Set and reset MTU: vCenter -> Host -> Configuration -> Networking -> distributed switch -> manage virtual adapters -> select the vmk which has the right IP address set -> set MTU from 9000 to 1500. Press OK. It should reconfigure. set MTU back to 9000. Now ping works from ESX to NAS and NFS mount works.
ERnsTL
  • 1
0

Unfortunately ESXi doesn't include the diagnostic commands rpcinfo and showmount. NFS, by default, uses UDP. In order to execute a mount, the system must be able to talk to the rpc portmapper on the NFS server (tcp/udp port 111.) That provides the ports for the mountd and nfs services. On any other system, I'd use rpcinfo -p <ip> to make sure portmap is working, and showmount -e <ip> to see what's being exported.

Also, unlike vMotion, FT logging, and iSCSI, NFS isn't locked to a specific vmk. It will use any available interface. As you have an interface in the same subnet as the NFS server, it should use that one.

If there are logs on the NAS, check there for any clues. Otherwise, dropping back to a single link and monitoring the traffic may be your only recourse. (does that switch do port mirroring?)

Ricky
  • 222
  • 2
  • 6
  • The QNAP has pretty pathetic logging but it says there are NFS connections from node1 and node2 on their respective vmk1's. I think when I successfully accessed NFS using a linux system attached to the same VLAN that was an indication that the NFS service on the QNAP was functioning correctly. I made a new vDS removed all the LACP and LAG stuff (and corresponding switch ports) from the uplinks and assigned node2's vmk1 to a NIC with the same results. Unfortunately, switch can't do port mirroring of LAGs but I will try doing so on vmk1's NIC for node2 under it's config. – Gerald Jan 27 '14 at 09:17
0

I gave up.

I removed LACP from the uplinks and switched to iSCSI with multi-path (a port group and associated vmk for each uplink, just for SAN).

Gerald
  • 49
  • 1
  • 1
  • 6
-1

I guess this has to do with NFS4. ESX only seems to support NSF3, otherwise it won't work.

-1

I had a similar problem with my configuration, you might be surprised but adding an entry for each esx host inside the /etc/hosts (IP hostname hostname) file of the QNAP solved my issue.

Hope this helps.

Arka
  • 1