Had the misfortune of upgrading a HP Proliant DL380 G4 from SLES 10 SP2 (i586) to SLES 10 SP4 (x86_64). Although the installation completed smoothly, the server became unresponsive after a couple of days of uptime. The server responds to PING but SSH and even console access fails. The only way to recover is to cold boot the server.
The syslogs do not show log anything when the server is unresponsive. On searching I could see similar instances reported for a varied flavors of Linux and usually was resolved by either upgrading the BIOS and or firmware of the server.
Also tried both acpi=ht and acpi=off at boot options without any success.
I have upgraded the server BIOS version available from HP passport site at this link but this did not resolve it.
Then I tried to upgrade the firmware of the Storage controller from here
I have rebooted the server and awaiting to see if this resolves the issue. Any suggestions/recommendations about what is the root cause and how can I go about fixing it?
I could find one post which is comes pretty close to what i am seeing Ubuntu 12.04 - HP ProLiant DL380 G4 - Load Maxes Out / Unresponsive
Server info:
Linux hostname 2.6.16.60-0.85.1-smp #1 SMP Thu Mar 17 11:45:06 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
> lscpu
Architecture: x86_64
CPU(s): 4
Thread(s) per core: 2
Core(s) per socket: 1
CPU socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 15
Model: 4
Stepping: 1
CPU MHz: 3200.225
L1d cache: 16K
L2 cache: 1024K
> modinfo cciss
filename: /lib/modules/2.6.16.60-0.85.1-smp/updates/cciss.ko
license: GPL
description: Driver for HP Smart Array Controllers version 3.6.28-24 (d927/s1461)
author: Hewlett-Packard Company
srcversion: 737C49390DD1F6FB9BC03F7
>slabtop
Active / Total Objects (% used) : 331966 / 339552 (97.8%)
Active / Total Slabs (% used) : 20306 / 20315 (100.0%)
Active / Total Caches (% used) : 98 / 136 (72.1%)
Active / Total Size (% used) : 78133.61K / 79253.95K (98.6%)
Minimum / Average / Maximum Object : 0.02K / 0.23K / 128.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
191752 191637 99% 0.09K 4358 44 17432K buffer_head
44916 44891 99% 0.20K 2364 19 9456K dentry_cache
35620 35561 99% 0.78K 7124 5 28496K ext3_inode_cache
15064 15035 99% 0.52K 2152 7 8608K radix_tree_node
6510 5859 90% 0.18K 310 21 1240K vm_area_struct
5782 5689 98% 0.06K 98 59 392K size-64
3840 3747 97% 0.08K 80 48 320K sysfs_dir_cache
3288 3271 99% 0.61K 548 6 2192K proc_inode_cache
3015 2259 74% 0.25K 201 15 804K filp
2304 2043 88% 0.02K 16 144 64K anon_vma
2304 1911 82% 0.02K 16 144 64K dm_tio
2208 1899 86% 0.04K 24 92 96K dm_io
2106 2096 99% 0.58K 351 6 1404K inode_cache
1710 1633 95% 0.12K 57 30 228K size-128
1680 1515 90% 0.03K 15 112 60K size-32
1480 1169 78% 0.09K 37 40 148K journal_head
Any pointers would be appreciated.