I'm having problem with few linux boxes running xen. They are acting as hypervisors and they are connected to SAN using multipath setup to provide storage to guest vms.
Every now and then one of two paths fails but it can be quickly restored by running:
multipath
multipath -ll
I need to get to the bottom of the issue and find out why this is happening. I have noticed that this doesn't occur when the hypervisor is not too busy (network and I/O wise). I have also eliminated possible hardware problem by moving all the services on to identical new chassis. I have collected few system logs which may indicate NIC module issue or kernel problem and failing multipath might be only a result of this!!?? Here is a bit of log which always shows up when multipath goes down:
kernel: BUG: soft lockup - CPU#0 stuck for 60s! [swapper:0]
kernel: BUG: soft lockup - CPU#2 stuck for 60s! [events/2:76]
I'll paste full logs at the end of this post to keep it easy to read. Now a little bit more about my setup:
- Internet access is setup over eth0 and eth2 (bonded)
- SAN multipath access is setup over eth1 and eth3
Server:
- Supermicro SuperServer 6016T-NTRF
- Intel(R) Xeon(R) CPU E5645
- Intel Corporation 82576 Gigabit Network
CentOS release 5.7 (Final) 2.6.18-274.18.1.el5xen
filename: /lib/modules/2.6.18-274.18.1.el5xen/kernel/drivers/net/igb/igb.ko
version: 3.0.6-k2-1
- Log 02
If anyone needs more details please get it touch. Any help will be much appreciated.