3

On a box recently upgraded from SLES 9.3 to 10.2, I'm seeing the following issue:

Prior to the upgrade, an NFS mount (defined via yast, i.e., it appeared in /etc/fstab) worked correctly. Following the upgrade, however, it is failing. A network trace shows that it is making the initial connection to the NFS server over TCP (for the portmapper RPC), but then it switches to UDP for the subsequent MOUNT call; since the NFS server doesn't allow UDP (with good reason, due to the possible issues with data corruption, as in nfs(5)), the connection will not go through.

Adding the TCP option (whether in fstab, or at the command line, etc.) has no effect.

In the course of troubleshooting this, I've found that /var/adm/messages is reporting the following as occurring during boot:

Failed services in runlevel 3: network

(I should note that despite this error message, apparently at least some network services are started, since the box is accessible via SSH.)

My questions, then:

  1. What should I be looking at to determine the cause of the service startup failure?
  2. Would this indeed be likely to cause the problem with NFS described above?
  3. If the answer to (2) is no, then any suggestions on what to look for?

Editing to add some information relating to the answers below.

It turns out that the network service is failing on bootup because one of the interfaces (there are two on this box) uses DHCP, and that's not available yet at this time. So I've disabled it for now, stopped/restarted the network service and the NFS client services, but still get the same results.

There's no firewall on the client side. Also, iptables -L on the client side shows that everything is accepted; and there are no entries in /etc/hosts.allow or /etc/hosts.deny.

On the NFS server side, nothing has changed. The remote nfsserver is indeed advertising that it allows both TCP and UDP for all of the NFS services (though there is an iptables rule blocking UDP).

/etc/fstab entry is pretty basic - what you'd get from setting it up in yast:

x.x.x.x:/volume      /localdir   nfs     defaults 0 0

rpcinfo -p for the client box shows only portmapper v2 running, advertising both TCP and UDP. For the server, it shows all of the usual services:

   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp   4047  status
    100024    1   tcp   4047  status
    100011    1   udp   4049  rquotad
    100021    1   udp   4045  nlockmgr
    100021    3   udp   4045  nlockmgr
    100021    4   udp   4045  nlockmgr
    100021    1   tcp   4045  nlockmgr
    100021    3   tcp   4045  nlockmgr
    100021    4   tcp   4045  nlockmgr
    100005    1   udp   4046  mountd
    100005    1   tcp   4046  mountd
    100005    2   udp   4046  mountd
    100005    2   tcp   4046  mountd
    100005    3   udp   4046  mountd
    100005    3   tcp   4046  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs

The mount call, with the /etc/fstab entry above, is simply:

mount /localdir

although I've also tried it with various options such as tcp, v3, etc.

Both the /etc/fstab entry (hence the mount) and the rpcinfo -p call are using the IP address, so there are no DNS resolution issues involved.

Alex
  • 253
  • 2
  • 9

6 Answers6

1

Check to make sure /etc/hosts.deny does not contain an entry for mountd, and check hosts.allow, for similar reasons. For what it's worth, I usually clear out hosts.deny and use iptables to control access.

Use rpcinfo -p nfsserver to ensure that mountd is indeed advertising TCP — there's an option -n to disable TCP-listening, which (IIRC on SuSE) would likely be set in /etc/sysconfig/nfs or thereabouts.

Alex M
  • 111
  • 3
1

as i understand your question, you can do the following:

  • ssh to your nfs client system
  • "connect" with rpcinfo from the client to the server
  • you have disbaled the dhcp interface, so every traffic is going over one interface and there are no other routes

but you can't mount a filesystem from the nfs server on the nfs client and you do not get any error message.

what is the difference between your rpcinfo and mount calls? do you use ip adress in one and fqdn in the other? could you please post both commands with output and returncode?

Christian
  • 4,645
  • 2
  • 23
  • 27
1

A couple things. First off, you state at the start that since the NFS server doesn't allow UDP, and then in your edit mention The remote nfsserver is indeed advertising that it allows both TCP and UDP for all of the NFS services. This seems a little odd. Why does the server advertise something that it doesn't allow?

Secondly, are you attempting to use NFS version 2 or version 3? Version 2 supports UDP only, whereas you need version 3 for TCP. Perhaps manually specifying version 3 in the mount options will help? (vers=3) If it's defaulting to 2, then even specifying TCP won't do you any good.

I've also had issues with newer clients attempting to use version 4, when the server didn't quite support it. Your SLES upgrade may have resulted in a different default version. All the more reason to specify it explicitly.

Why don't you post the entry in /etc/fstab as well?

Christopher Karel
  • 6,442
  • 1
  • 26
  • 34
  • The mountd only allows one to disable TCP service, not UDP (Probably because NFS was originally designed around UDP). – Alex M Feb 04 '10 at 09:01
  • When you try UDP for the mount, it doesn't get through (verified this using tcpdump). As Pi pointed out, there's no way to disable UDP within nfsd itself; it is being blocked using an iptables rule. The client is using NFSv3 - verified that as well with tcpdump. Have also tried specifying it explicitly. – Alex Feb 04 '10 at 18:22
0
  1. Try starting the service manually with service network restart and see what messages you get. There should be some information there.
  2. Possibly...
  3. Check to see if there is any firewall enabled by default on your system, this may be causing problems. Especially if the failed networking start doesn't correctly load the firewall rules.
Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
0

Try setting things explicitly and see where that gets you. For instance, in /etc/fstab:

x.x.x.x:/vol /local nfs proto=tcp,port=2049,mountport=4046,nfsvers=3 0 0

This should at least bypass the portmapper and explicitly try connecting to the TCP ports that you list above and make it easier to tcpdump trace each channel during your debugging.

  • Same results, unfortunately. Looks like it still first connects to portmapper anyway (via TCP), and then continues in UDP. – Alex Feb 08 '10 at 04:42
0

For reference, in case anyone else comes across this question and wants an answer:

I finally opened a ticket with Novell on this. It turns out that this is a known bug in SLES 10.2 (491140: mount ignores "proto=" for "nfs"), and there is a patch for it (util-linux-2.12r-35.35.2.x86_64.rpm). With that installed, the mount works as expected, and all requests are made over TCP. (Novell support also informed me that this is merged in SLES 10.3.)

Alex
  • 253
  • 2
  • 9