Why would my RPC server use 5x CPU when moving to a newer Linux kernel?

1

We're kind of running into a wall here, so I figured it was worth asking on this site.

We have a Java process which serves up old Sun RPC connections as a compatiblility layer into a new system. On older SuSE machines (2007 Xeon with 2.6.16), everything worked just fine. While we've been trying to migrate to a more modern Xeon E5-2670 platform with RHEL 2.6.32, the process runs but it is pegging all of the CPUs under load and responds SLOWER than the previous kit. Client-side load is the same, we're using faster server disk, same number of physical cores (though now x2 because of hyperthreading), same RAM (which is not being starved or swapped).

Profiling isn't really revealing anything. There was some suspicion about disk running slower because of logs being written (was ext3 on old, now ext4+acl) but that seems to be not much of an issue and "log load" is same. iostat, netstat all looks normal.

I'm suspicious that something has changed on the RPC handling between the kernels, but it seems to be difficult to find much information since (I'm guessing) Sun RPC type communication isn't as popular today.

Any thoughts? I don't expect someone to necessarily solve the issue since I can't share too much about it, but perhaps pointers as to what to look at to diagnose RPC and kernel overhead?

thanks!

bjb

Posted 2015-01-14T15:47:56.623

Reputation: 158

Have you tried disabling hyperthreading? What kind of CPU usage is it doing (user/system/iowait/...) ? – golimar – 2015-01-14T16:13:48.057

Answers

1

It appears that the problem stems from transparent hugepage feature of the kernel. I'm not sure of the complete technical details, but suffice to say the following three commands to disable it fixed things:

echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

CPU load has dropped back to where we had it before moving to the new kernel.

Hope this helps someone else since I couldn't find a dang thing on the internet in regards to RPC that wasn't NFS related! :-)

bjb

Posted 2015-01-14T15:47:56.623

Reputation: 158

-1

Check your code for While-sleeps and replace them by actual sleep + check your kernel config.

Alexey Vesnin

Posted 2015-01-14T15:47:56.623

Reputation: 565