3

So I have this problem. This horrible, awful problem.

I have a set of Linux NFS servers (in a NFS/CIFS cluster using CTDB) that refuse locks… only when the lock is blocking. If it's a non-blocking call, it works just fine.

See the traffic flow below:

Blocking lock call:

  9.414674   10.10.1.40 -> 10.10.1.14   NLM 282 V4 LOCK Call FH:0xf6b3519c svid:5 pos:0-0  nlm.lock.caller_name == "centos-ad2012r2"  nlm.exclusive == 1  nlm.block == 1
  9.415002   10.10.1.14 -> 10.10.1.40   NLM 106 V4 LOCK Reply (Call In 39) NLM_BLOCKED  nlm.stat == 3
 18.613965   10.10.1.40 -> 10.10.1.14   NLM 274 V4 CANCEL Call FH:0xf6b3519c svid:5 pos:0-0  nlm.lock.caller_name == "centos-ad2012r2"  nlm.exclusive == 1  nlm.block == 1
 18.614003   10.10.1.40 -> 10.10.1.14   NLM 266 V4 UNLOCK Call FH:0xf6b3519c svid:5 pos:0-0  nlm.lock.caller_name == "centos-ad2012r2"
 18.614675   10.10.1.14 -> 10.10.1.40   NLM 106 V4 CANCEL Reply (Call In 55) NLM_DENIED  nlm.stat == 1
 18.614889   10.10.1.14 -> 10.10.1.40   NLM 106 V4 UNLOCK Reply (Call In 56)  nlm.stat == 0

Non-blocking lock call:

 47.476050   10.10.1.40 -> 10.10.1.14   NLM 282 V4 LOCK Call FH:0xf6b3519c svid:6 pos:0-0  nlm.lock.caller_name == "centos-ad2012r2"  nlm.exclusive == 1  nlm.block == 0
 47.476647   10.10.1.14 -> 10.10.1.40   NLM 106 V4 LOCK Reply (Call In 102)  nlm.stat == 0
 51.908995   10.10.1.40 -> 10.10.1.14   NLM 266 V4 UNLOCK Call FH:0xf6b3519c svid:6 pos:0-0  nlm.lock.caller_name == "centos-ad2012r2"
 51.909700   10.10.1.14 -> 10.10.1.40   NLM 106 V4 UNLOCK Reply (Call In 112)  nlm.stat == 0

Client is Centos 6.5

Server is Scientific Linux 6.2

Underlying filesystem is Lustre. The problem may have a similar/same cause as this other problem:

The asynchronous locking interface does something slightly cheesy for blocking locks--instead of waiting for the filesystem to respond, it just sends back a deny immediately (even if the lock might actually be available), then responds later with a granted message when it discovers it's available.

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • 1
    The link to "this other problem" fails. What does the text you quoted refer to? Have you been able to find out anything? I seem to have the same problem or a similar one. – Sybille Peters Feb 04 '20 at 10:35
  • changed to wayback To answer your question, I suppose the logic on the server side would need to be changed. – MikeyB Feb 05 '20 at 23:10

0 Answers0