12

I have quite strange situation, where my CentOS 5.5 box loads are high, but the CPU and memory used are pretty low:

top - 20:41:38 up 42 days,  6:14,  2 users,  load average: 19.79, 21.25, 18.87
Tasks: 254 total,   1 running, 253 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.8%us,  0.3%sy,  0.1%ni, 95.0%id,  0.6%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4035284k total,  4008084k used,    27200k free,    38748k buffers
Swap:  4208928k total,   242576k used,  3966352k free,  1465008k cached

free -mt
             total       used       free     shared    buffers     cached
Mem:          3940       3910         29          0         37       1427
-/+ buffers/cache:       2445       1495
Swap:         4110        236       3873
Total:        8050       4147       3903

Iostat also shows good results:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.83    0.13    0.41    0.58    0.00   95.05

Here is the ps aux output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  10348    80 ?        Ss    2010   2:11 init [3]                                           
root         2  0.0  0.0      0     0 ?        S<    2010   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/0]
root         5  0.0  0.0      0     0 ?        S<    2010   0:02 [migration/1]
root         6  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/1]
root         7  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/1]
root         8  0.0  0.0      0     0 ?        S<    2010   0:02 [migration/2]
root         9  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/2]
root        10  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/2]
root        11  0.0  0.0      0     0 ?        S<    2010   0:02 [migration/3]
root        12  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/3]
root        13  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/3]
root        14  0.0  0.0      0     0 ?        S<    2010   0:03 [migration/4]
root        15  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/4]
root        16  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/4]
root        17  0.0  0.0      0     0 ?        S<    2010   0:01 [migration/5]
root        18  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/5]
root        19  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/5]
root        20  0.0  0.0      0     0 ?        S<    2010   0:11 [migration/6]
root        21  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/6]
root        22  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/6]
root        23  0.0  0.0      0     0 ?        S<    2010   0:01 [migration/7]
root        24  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/7]
root        25  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/7]
root        26  0.0  0.0      0     0 ?        S<    2010   0:00 [migration/8]
root        27  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/8]
root        28  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/8]
root        29  0.0  0.0      0     0 ?        S<    2010   0:00 [migration/9]
root        30  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/9]
root        31  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/9]
root        32  0.0  0.0      0     0 ?        S<    2010   0:08 [migration/10]
root        33  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/10]
root        34  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/10]
root        35  0.0  0.0      0     0 ?        S<    2010   0:05 [migration/11]
root        36  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/11]
root        37  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/11]
root        38  0.0  0.0      0     0 ?        S<    2010   0:02 [migration/12]
root        39  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/12]
root        40  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/12]
root        41  0.0  0.0      0     0 ?        S<    2010   0:14 [migration/13]
root        42  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/13]
root        43  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/13]
root        44  0.0  0.0      0     0 ?        S<    2010   0:04 [migration/14]
root        45  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/14]
root        46  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/14]
root        47  0.0  0.0      0     0 ?        S<    2010   0:01 [migration/15]
root        48  0.0  0.0      0     0 ?        SN    2010   0:00 [ksoftirqd/15]
root        49  0.0  0.0      0     0 ?        S<    2010   0:00 [watchdog/15]
root        50  0.0  0.0      0     0 ?        S<    2010   0:00 [events/0]
root        51  0.0  0.0      0     0 ?        S<    2010   0:00 [events/1]
root        52  0.0  0.0      0     0 ?        S<    2010   0:00 [events/2]
root        53  0.0  0.0      0     0 ?        S<    2010   0:00 [events/3]
root        54  0.0  0.0      0     0 ?        S<    2010   0:00 [events/4]
root        55  0.0  0.0      0     0 ?        S<    2010   0:00 [events/5]
root        56  0.0  0.0      0     0 ?        S<    2010   0:00 [events/6]
root        57  0.0  0.0      0     0 ?        S<    2010   0:00 [events/7]
root        58  0.0  0.0      0     0 ?        S<    2010   0:00 [events/8]
root        59  0.0  0.0      0     0 ?        S<    2010   0:00 [events/9]
root        60  0.0  0.0      0     0 ?        S<    2010   0:00 [events/10]
root        61  0.0  0.0      0     0 ?        S<    2010   0:00 [events/11]
root        62  0.0  0.0      0     0 ?        S<    2010   0:00 [events/12]
root        63  0.0  0.0      0     0 ?        S<    2010   0:00 [events/13]
root        64  0.0  0.0      0     0 ?        S<    2010   0:00 [events/14]
root        65  0.0  0.0      0     0 ?        S<    2010   0:00 [events/15]
root        66  0.0  0.0      0     0 ?        S<    2010   0:00 [khelper]
root       107  0.0  0.0      0     0 ?        S<    2010   0:00 [kthread]
root       126  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/0]
root       127  0.0  0.0      0     0 ?        S<    2010   0:03 [kblockd/1]
root       128  0.0  0.0      0     0 ?        S<    2010   0:01 [kblockd/2]
root       129  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/3]
root       130  0.0  0.0      0     0 ?        S<    2010   0:05 [kblockd/4]
root       131  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/5]
root       132  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/6]
root       133  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/7]
root       134  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/8]
root       135  0.0  0.0      0     0 ?        S<    2010   0:02 [kblockd/9]
root       136  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/10]
root       137  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/11]
root       138  0.0  0.0      0     0 ?        S<    2010   0:04 [kblockd/12]
root       139  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/13]
root       140  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/14]
root       141  0.0  0.0      0     0 ?        S<    2010   0:00 [kblockd/15]
root       142  0.0  0.0      0     0 ?        S<    2010   0:00 [kacpid]
root       281  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/0]
root       282  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/1]
root       283  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/2]
root       284  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/3]
root       285  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/4]
root       286  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/5]
root       287  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/6]
root       288  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/7]
root       289  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/8]
root       290  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/9]
root       291  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/10]
root       292  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/11]
root       293  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/12]
root       294  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/13]
root       295  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/14]
root       296  0.0  0.0      0     0 ?        S<    2010   0:00 [cqueue/15]
root       299  0.0  0.0      0     0 ?        S<    2010   0:00 [khubd]
root       301  0.0  0.0      0     0 ?        S<    2010   0:00 [kseriod]
root       490  0.0  0.0      0     0 ?        S     2010   0:00 [khungtaskd]
root       493  0.1  0.0      0     0 ?        S<    2010  94:48 [kswapd1]
root       494  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/0]
root       495  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/1]
root       496  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/2]
root       497  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/3]
root       498  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/4]
root       499  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/5]
root       500  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/6]
root       501  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/7]
root       502  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/8]
root       503  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/9]
root       504  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/10]
root       505  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/11]
root       506  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/12]
root       507  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/13]
root       508  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/14]
root       509  0.0  0.0      0     0 ?        S<    2010   0:00 [aio/15]
root       665  0.0  0.0      0     0 ?        S<    2010   0:00 [kpsmoused]
root       808  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/0]
root       809  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/1]
root       810  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/2]
root       811  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/3]
root       812  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/4]
root       813  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/5]
root       814  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/6]
root       815  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/7]
root       816  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/8]
root       817  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/9]
root       818  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/10]
root       819  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/11]
root       820  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/12]
root       821  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/13]
root       822  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/14]
root       823  0.0  0.0      0     0 ?        S<    2010   0:00 [ata/15]
root       824  0.0  0.0      0     0 ?        S<    2010   0:00 [ata_aux]
root       842  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_0]
root       843  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_1]
root       844  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_2]
root       845  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_3]
root       846  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_4]
root       847  0.0  0.0      0     0 ?        S<    2010   0:00 [scsi_eh_5]
root       882  0.0  0.0      0     0 ?        S<    2010   0:00 [kstriped]
root       951  0.0  0.0      0     0 ?        S<    2010   4:24 [kjournald]
root       976  0.0  0.0      0     0 ?        S<    2010   0:00 [kauditd]
postfix    990  0.0  0.0  54208  2284 ?        S    21:19   0:00 pickup -l -t fifo -u
root      1013  0.0  0.0  12676     8 ?        S<s   2010   0:00 /sbin/udevd -d
root      1326  0.0  0.0  90900  3400 ?        Ss   14:53   0:00 sshd: root@notty 
root      1410  0.0  0.0  53972  2108 ?        Ss   14:53   0:00 /usr/libexec/openssh/sftp-server
root      2690  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/0]
root      2691  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/1]
root      2692  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/2]
root      2693  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/3]
root      2694  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/4]
root      2695  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/5]
root      2696  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/6]
root      2697  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/7]
root      2698  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/8]
root      2699  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/9]
root      2700  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/10]
root      2701  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/11]
root      2702  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/12]
root      2703  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/13]
root      2704  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/14]
root      2705  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpathd/15]
root      2706  0.0  0.0      0     0 ?        S<    2010   0:00 [kmpath_handlerd]
root      2755  0.0  0.0      0     0 ?        S<    2010   4:35 [kjournald]
root      2757  0.0  0.0      0     0 ?        S<    2010   3:38 [kjournald]
root      2759  0.0  0.0      0     0 ?        S<    2010   4:10 [kjournald]
root      2761  0.0  0.0      0     0 ?        S<    2010   4:26 [kjournald]
root      2763  0.0  0.0      0     0 ?        S<    2010   3:15 [kjournald]
root      2765  0.0  0.0      0     0 ?        S<    2010   3:04 [kjournald]
root      2767  0.0  0.0      0     0 ?        S<    2010   3:02 [kjournald]
root      2769  0.0  0.0      0     0 ?        S<    2010   2:58 [kjournald]
root      2771  0.0  0.0      0     0 ?        S<    2010   0:00 [kjournald]
root      3340  0.0  0.0   5908   356 ?        Ss    2010   2:48 syslogd -m 0
root      3343  0.0  0.0   3804   212 ?        Ss    2010   0:03 klogd -x
root      3430  0.0  0.0      0     0 ?        S<    2010   0:50 [kondemand/0]
root      3431  0.0  0.0      0     0 ?        S<    2010   0:54 [kondemand/1]
root      3432  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/2]
root      3433  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/3]
root      3434  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/4]
root      3435  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/5]
root      3436  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/6]
root      3437  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/7]
root      3438  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/8]
root      3439  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/9]
root      3440  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/10]
root      3441  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/11]
root      3442  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/12]
root      3443  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/13]
root      3444  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/14]
root      3445  0.0  0.0      0     0 ?        S<    2010   0:00 [kondemand/15]
root      3461  0.0  0.0  10760   284 ?        Ss    2010   3:44 irqbalance
rpc       3481  0.0  0.0   8052     4 ?        Ss    2010   0:00 portmap
root      3526  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/0]
root      3527  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/1]
root      3528  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/2]
root      3529  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/3]
root      3530  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/4]
root      3531  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/5]
root      3532  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/6]
root      3533  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/7]
root      3534  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/8]
root      3535  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/9]
root      3536  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/10]
root      3537  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/11]
root      3538  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/12]
root      3539  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/13]
root      3540  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/14]
root      3541  0.0  0.0      0     0 ?        S<    2010   0:00 [rpciod/15]
root      3563  0.0  0.0  10160     8 ?        Ss    2010   0:00 rpc.statd
root      3595  0.0  0.0  55180     4 ?        Ss    2010   0:00 rpc.idmapd
dbus      3618  0.0  0.0  21256    28 ?        Ss    2010   0:00 dbus-daemon --system
root      3649  0.2  0.4 563084 18796 ?        S<sl  2010 179:03 mfsmount /mnt/mfs -o rw,mfsmaster=web1.ovs.local
root      3702  0.0  0.0   3800     8 ?        Ss    2010   0:00 /usr/sbin/acpid
68        3715  0.0  0.0  31312   816 ?        Ss    2010   3:14 hald
root      3716  0.0  0.0  21692    28 ?        S     2010   0:00 hald-runner
68        3726  0.0  0.0  12324     8 ?        S     2010   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68        3730  0.0  0.0  12324     8 ?        S     2010   0:00 hald-addon-keyboard: listening on /dev/input/event0
root      3773  0.0  0.0  62608   332 ?        Ss    2010   0:00 /usr/sbin/sshd
ganglia   3786  0.0  0.0  24704   988 ?        Ss    2010  14:26 /usr/sbin/gmond
root      3843  0.0  0.0  54144   300 ?        Ss    2010   1:49 /usr/libexec/postfix/master
postfix   3855  0.0  0.0  54860  1060 ?        S     2010   0:22 qmgr -l -t fifo -u
root      3877  0.0  0.0  74828   708 ?        Ss    2010   1:15 crond
root      3891  1.4  1.9 326960 77704 ?        S<l   2010 896:59 mfschunkserver
root      4122  0.0  0.0  18732   176 ?        Ss    2010   0:10 /usr/sbin/atd
root      4193  0.0  0.8 129180 35984 ?        Ssl   2010  11:04 /usr/bin/ruby /usr/sbin/puppetd
root      4223  0.0  0.0  18416   172 ?        S     2010   0:10 /usr/sbin/smartd -q never
root      4227  0.0  0.0   3792     8 tty1     Ss+   2010   0:00 /sbin/mingetty tty1
root      4230  0.0  0.0   3792     8 tty2     Ss+   2010   0:00 /sbin/mingetty tty2
root      4231  0.0  0.0   3792     8 tty3     Ss+   2010   0:00 /sbin/mingetty tty3
root      4233  0.0  0.0   3792     8 tty4     Ss+   2010   0:00 /sbin/mingetty tty4
root      4234  0.0  0.0   3792     8 tty5     Ss+   2010   0:00 /sbin/mingetty tty5
root      4236  0.0  0.0   3792     8 tty6     Ss+   2010   0:00 /sbin/mingetty tty6
root      5596  0.0  0.0  19368    20 ?        Ss    2010   0:00 DarwinStreamingServer
qtss      5597  0.8  0.9 166572 37408 ?        Sl    2010 523:02 DarwinStreamingServer
root      8714  0.0  0.0      0     0 ?        S    Jan31   0:33 [pdflush]
root      9914  0.0  0.0  65612   968 pts/1    R+   21:49   0:00 ps aux
root     10765  0.0  0.0  76792  1080 ?        Ss   Jan24   0:58 SCREEN
root     10766  0.0  0.0  66212   872 pts/3    Ss   Jan24   0:00 /bin/bash
root     11833  0.0  0.0  63852  1060 pts/3    S+   17:17   0:00 /bin/sh ./launch.sh
root     11834  437 42.9 4126884 1733348 pts/3 Sl+  17:17 1190:50 /usr/bin/java -Xms128m -Xmx512m -XX:+UseConcMarkSweepGC -jar /JavaCore/JavaCore.jar
root     13127  4.7  1.1 110564 46876 ?        Ssl  17:18  12:55 /JavaCore/fetcher.bin
root     19392  0.0  0.0  90108  3336 ?        Rs   20:35   0:00 sshd: root@pts/1 
root     19401  0.0  0.0  66216  1640 pts/1    Ss   20:35   0:00 -bash
root     20567  0.0  0.0  90108   412 ?        Ss   Jan16   1:58 sshd: root@pts/0 
root     20569  0.0  0.0  66084   912 pts/0    Ss   Jan16   0:00 -bash
root     21053  0.0  0.0  63856    28 ?        S    Jan30   0:00 /bin/sh /usr/bin/WowzaMediaServerd /usr/local/WowzaMediaServer/bin/setenv.sh /var/run/WowzaM
root     21054  2.9 10.3 2252652 418468 ?      Sl   Jan30 314:25 java -Xmx1200M -server -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote=true -
root     21915  0.0  0.0      0     0 ?        S    Feb01   0:00 [pdflush]
root     29996  0.0  0.0  76524  1004 pts/0    S+   14:41   0:00 screen -x

sar -W output:

12:00:01 AM  pswpin/s pswpout/s
12:10:01 AM      0.00      0.00
12:20:01 AM      0.00      0.00
12:30:02 AM      0.00      0.00
12:40:01 AM      0.00      0.00
12:50:01 AM      0.00      0.00
01:00:01 AM      0.00      0.00
01:10:01 AM      0.00      0.00
01:20:01 AM      0.00      0.00
01:30:01 AM      0.00      0.00
01:40:01 AM      0.00      0.00
01:50:01 AM      0.00      0.00
02:00:02 AM      0.00      0.00
02:10:01 AM      0.07      0.00
02:20:01 AM      0.00      0.00
02:30:02 AM      0.00      0.00
02:40:01 AM      0.00      0.00
02:50:01 AM      0.00      0.00
03:00:01 AM      0.00      0.00
03:10:01 AM      0.00      0.00
03:20:01 AM      0.00      0.00
03:30:01 AM      0.00      0.00
03:40:02 AM      0.00      0.00
03:50:01 AM      0.00      0.00
04:00:01 AM      0.00      0.00
04:10:01 AM      0.00      0.00
04:20:02 AM      0.01      0.00
04:30:01 AM      0.00      0.00
04:40:02 AM      0.11      0.00
04:50:01 AM      0.01      0.00
05:00:02 AM      0.03      0.00
05:10:01 AM      0.00      0.00
05:20:02 AM      0.01      0.00
05:30:01 AM      0.04      0.00
05:40:02 AM      0.08      0.00
05:50:01 AM      0.00      0.00
06:00:02 AM      0.11      0.00
06:10:01 AM      0.01      0.00
06:20:01 AM      0.00      0.00
06:30:02 AM      0.00      0.00
06:40:01 AM      0.05      0.00
06:50:02 AM      0.00      0.00
07:00:02 AM      0.01      0.00
07:10:01 AM      0.02      0.00
07:20:02 AM      0.00      0.00
07:30:01 AM      0.00      0.00
07:40:02 AM      0.17      0.00
07:50:01 AM      0.11      0.00
08:00:02 AM      0.00      0.00
08:10:01 AM      0.04      0.00
08:20:02 AM      0.00      0.00
08:30:01 AM      0.00      0.00
08:40:02 AM      0.03      0.00
08:50:01 AM      0.00      0.00
09:00:02 AM      0.08      0.00
09:10:01 AM      0.00      0.00
Average:         0.02      0.00

sar -d 5 0 output

09:18:40 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
09:18:45 AM    dev8-0     21.96     11.18   2128.54     97.45      0.62     20.33      1.75      3.85
09:18:45 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM    dev8-2     18.16      0.00    507.78     27.96      0.39     21.42      1.09      1.98
09:18:45 AM    dev8-3      3.79     11.18   1620.76    430.32      0.23     15.11      4.95      1.88
09:18:45 AM    dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM    dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-16     19.36      0.00    619.56     32.00      0.35     18.22      0.94      1.82
09:18:45 AM   dev8-17     17.96      0.00    594.01     33.07      0.35     19.61      0.99      1.78
09:18:45 AM   dev8-18      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-19      1.40      0.00     25.55     18.29      0.00      0.29      0.29      0.04
09:18:45 AM   dev8-20      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-21      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-32     25.55    742.51    846.31     62.19      0.69     27.04      1.63      4.17
09:18:45 AM   dev8-33     22.55      0.00    844.71     37.45      0.68     30.18      1.41      3.17
09:18:45 AM   dev8-34      2.99    742.51      1.60    248.53      0.01      3.40      3.40      1.02
09:18:45 AM   dev8-35      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-48     18.16      0.00    645.11     35.52      0.41     22.65      1.09      1.98
09:18:45 AM   dev8-49     18.16      0.00    645.11     35.52      0.41     22.65      1.09      1.98
09:18:45 AM   dev8-50      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
09:18:45 AM   dev8-51      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

sar -w output:

12:00:01 AM   cswch/s
05:10:01 AM  51556.52
05:20:02 AM  54484.96
05:30:01 AM  57533.58
05:40:02 AM  57956.67
05:50:01 AM  57885.02
06:00:02 AM  46017.63
06:10:01 AM  21778.29
06:20:01 AM  43464.68
06:30:02 AM  51766.88
06:40:01 AM  53879.12
06:50:02 AM  56531.08
07:00:02 AM  57732.89
07:10:01 AM  57658.24
07:20:02 AM  58209.16
07:30:01 AM  58446.37
07:40:02 AM  58537.84
07:50:01 AM  58243.39
08:00:02 AM  58362.02
08:10:01 AM  58291.73
08:20:02 AM  58370.15
08:30:01 AM  58545.73
08:40:02 AM  58448.40
08:50:01 AM  58198.55
09:00:02 AM  58313.22
09:10:01 AM  58122.85
09:20:02 AM  58517.96
09:30:01 AM  58338.94
09:40:02 AM  58317.30
09:50:01 AM  58312.40
10:00:01 AM  58337.53
10:10:02 AM  58167.55
10:20:01 AM  58408.61
10:30:01 AM  58133.29
10:40:01 AM  58165.08
10:50:02 AM  58240.77
11:00:01 AM  58236.16

Average:      cswch/s
Average:     55991.47

sar -I SUM output:

12:00:01 AM      INTR    intr/s
05:10:01 AM       sum   3825.03
05:20:02 AM       sum   3999.32
05:30:01 AM       sum   4038.10
05:40:02 AM       sum   4041.99
05:50:01 AM       sum   4015.39
06:00:02 AM       sum   3450.48
06:10:01 AM       sum   2385.73
06:20:01 AM       sum   3355.96
06:30:02 AM       sum   3641.50
06:40:01 AM       sum   3807.91
06:50:02 AM       sum   3853.99
07:00:02 AM       sum   3951.29
07:10:01 AM       sum   3996.06
07:20:02 AM       sum   4005.63
07:30:01 AM       sum   3939.43
07:40:02 AM       sum   3901.39
07:50:01 AM       sum   3920.22
08:00:02 AM       sum   3950.27
08:10:01 AM       sum   3926.09
08:20:02 AM       sum   4072.29
08:30:01 AM       sum   4058.93
08:40:02 AM       sum   3994.94
08:50:01 AM       sum   3969.04
09:00:02 AM       sum   3976.33
09:10:01 AM       sum   3904.43
09:20:02 AM       sum   4054.35
09:30:01 AM       sum   4006.06
09:40:02 AM       sum   3962.65
09:50:01 AM       sum   4016.83
10:00:01 AM       sum   4064.52
10:10:02 AM       sum   3934.29
10:20:01 AM       sum   4029.60
10:30:01 AM       sum   3939.23
10:40:01 AM       sum   3937.37
10:50:02 AM       sum   3961.87
11:00:01 AM       sum   4014.83

Average:         INTR    intr/s
Average:          sum   3794.24

Any idea what could this be, or where I should look for more diagnostic information?

Thanks.

SyRenity
  • 3,159
  • 11
  • 55
  • 79

5 Answers5

7

Intuitively, I'd suspect a disk issue as the most direct cause, but that doesn't mean your disks are too slow. Your iowait % from iostat doesn't indicate that any user processes are spending a lot of time waiting for disk I/O. However, your CPU time on kswapd gives me cause for concern:

root       493  0.1  0.0      0     0 ?        S<    2010  94:48 [kswapd1]

The 242MB of swap you're using may not seem like a lot, but to hit that kind of CPU time on a system that's only been up for 42 days you've either got a lot of swap activity happening or it's taking forever to finish once it starts because of other disk contention. Whether this is the source of your problem or not, it's something I would definitely look into.

Can you run sar -W and post the swap statistics for your system?

jgoldschrafe
  • 4,385
  • 17
  • 18
2

The most common cause of high load is slow drives. Try running the following

sar -d 5 0

and looking at the %util field. If that number is over 70% for any of your drives, that drive will be slow in handling IO requests causing the high load.

Edit: It might run OK at 70%, but thats the point where you'll probably start to see performance degradation. The higher you go, the worse it'll get.

phemmer
  • 5,789
  • 2
  • 26
  • 35
  • 1
    Look at the queue size and service times. If these increase then you have a problem. – BillThor Feb 07 '11 at 00:52
  • Ya, these can help indicate problems, but I didnt mention them because I've seen them spike without any adverse affects. – phemmer Feb 07 '11 at 02:10
  • 2
    Using disk performance as a first stop on a root-cause analysis can be dangerous, though. One of the reasons that disk I/O is one of the last places I'll look for a problem, even though it symptomatically tends to be the cause of a lot of issues, is that there are a lot of funny things that can go wrong on a system that will cause it to start hammering the disk with things other than the intended workload, and trying to fix the issue by adding more or faster spindles may not always be the best approach. – jgoldschrafe Feb 07 '11 at 04:40
  • @jgoldschrafe well of course, if the disk is the problem you'll have to figure out why. You cant start looking at why processes are chewing through up disk IO until you confirm the high load is being caused by the disk IO. – phemmer Feb 07 '11 at 06:12
  • I posted the sar -d results, looks far below the threshold of 70%. – SyRenity Feb 07 '11 at 09:13
1

Lots of system processes in "S<" state. On my machine they're listed as being just in "S". From man ps: < high-priority (not nice to other users). Something looks really screwed up. Try updating your kernel if it can be done and reboot.

poige
  • 9,171
  • 2
  • 24
  • 50
1

What kind of network connection your server has? I have seen loads sky-rocketing in situations where the connection to switch was supposed to be 100 Mbit/s full duplex, but for some reason was negotiated as 100 Mbit/s half duplex. After I forced the 100M-FD mode with ethtool, loads dropped below 1 and network transfer speeds returned to normal.

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
  • Ethtool shows a connection of Speed: 1000Mb/s, Duplex: Full. – SyRenity Feb 07 '11 at 09:43
  • OK, probably not a network issue, then. Next thing for you to check: number of context switches and interrupts. Do that with sar -w (for context switches) and sar -I SUM (for interrupts). – Janne Pikkarainen Feb 07 '11 at 09:44
  • I added the both outputs in description above. – SyRenity Feb 07 '11 at 10:58
  • The context switches are on a quite high side; I usually see something between from hundreds to couple of thousands switches/s even on loaded servers. Maybe those Java processes are doing something odd? – Janne Pikkarainen Feb 07 '11 at 11:00
  • The Java apps are heavily multi-threaded, perhaps this the cause of the context-switching? – SyRenity Feb 08 '11 at 15:17
-1

You wrote you see an OOM kill on your logs. After the OOM killer triggers, you should reboot ASAP.

So reboot and your problems will be gone.

Richard

rems
  • 2,240
  • 13
  • 11
  • 3
    The OOM kill has removed the offending process which takes most of memory, so why a reboot is required? – SyRenity Feb 09 '11 at 09:32