8

I have a LAMP cluster that shares files via NFS and occasionally one of them will be stricken for a while when mysterious flush processes start appearing.

Can anyone help me? The only way to resolve this is to reboot - killing the processes only spawns new ones.

top - 19:43:43 up 104 days,  4:52,  1 user,  load average: 27.15, 56.72, 33.31
Tasks: 301 total,   9 running, 292 sleeping,   0 stopped,   0 zombie
Cpu(s): 15.6%us, 77.0%sy,  0.0%ni,  4.2%id,  2.0%wa,  0.0%hi,  1.2%si,  0.0%st
Mem:   8049708k total,  7060492k used,   989216k free,   157156k buffers
Swap:  4194296k total,   483228k used,  3711068k free,   928768k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                           
840 root      20   0     0    0    0 R 98.0  0.0   6:45.83 flush-0:24                                                                                                        
843 root      20   0     0    0    0 R 97.6  0.0   5:50.32 flush-0:25                                                                                                        
835 root      20   0     0    0    0 R 96.0  0.0   6:42.44 flush-0:22                                                                                                        
836 root      20   0     0    0    0 R 95.0  0.0   6:51.56 flush-0:27                                                                                                        
833 root      20   0     0    0    0 R 94.3  0.0   6:27.21 flush-0:23                                                                                                        
841 root      20   0     0    0    0 R 93.7  0.0   6:46.97 flush-0:26                                                                                                        
2305 apache    20   0  772m  31m  25m S 23.6  0.4   0:07.60 httpd                                                                                                             
2298 apache    20   0  772m  31m  25m S 13.6  0.4   0:08.98 httpd                                                                                                             
26771 apache    20   0  775m  47m  41m S 10.3  0.6   4:07.97 httpd                                                                                                             
2315 apache    20   0  770m  29m  25m S  9.0  0.4   0:07.44 httpd                                                                                                             
24370 memcache  20   0  457m 123m  608 S  8.6  1.6  66:20.28 memcached                                                                                                         
1191 apache    20   0  770m  30m  26m S  8.3  0.4   0:13.54 httpd                                                                                                             
2253 apache    20   0  771m  32m  27m S  8.3  0.4   0:11.75 httpd                                                                                                             
3476 varnish   20   0 52.9g 2.0g  20m S  8.0 25.6   0:15.30 varnishd                                                                                                          
17234 apache    20   0  775m  50m  45m S  7.0  0.6   9:22.09 httpd                                                                                                             
23161 apache    20   0  780m  54m  43m S  7.0  0.7   6:33.40 httpd

Thanks

Tom
  • 691
  • 3
  • 11
  • 23

4 Answers4

9

Your system is being overloaded with disk writing requests and your configuration "dirty ratio" is not optimal for your environment.

You can set two administrative parameters for virtual memory:

These are the dirty_background_ratio and dirty_ratio locatable in /proc/sys/vm/

These parameters represent a percentage of memory.

If you setting a low value for dirty_ratio You can get more disk load but would reduce the consumption of RAM for dirty memory management.

The dirty_background_ratio is the percentage minimal residual memory, which caused the stoppage of writing dirty data in the disk from the system.   This means that you must find the best compromise between the dirty chunks dimension to write (flush process) and minimum memory where the system will be stop in the writing process.

Relationship for good performance could be:

dirty_ratio 90%
dirty_background_ratio 5%

an average ratio:

dirty_ratio 40~50%
dirty_background_ratio 10~20%

The causes of this imbalance in your system can be several, among the most common causes is an insufficient amount of RAM to manage the installed other times it may simply be due to a drop in performance of memory installed on your server with causes ranging from poor ventilation to incorrect feeding.

Although most of the problems are in the form of software bugs, not known to many of these errors are due to poor confuguracion of the hardware in relation to the services installed. Especially in the case of rented machines.


To help those less familiar with Linux machines, the above mentioned parameters can be replaced in this way:

Permanent mode:
(run these two commands only once, otherwise edit this file with your favorite editor)

# echo "vm.dirty_ratio = 40" >> /etc/sysctl.conf
# echo "vm.dirty_background_ratio = 10" >> /etc/sysctl.conf

Temporally mode:

# echo "40" > /proc/sys/vm/dirty_ratio
# echo "10" > /proc/sys/vm/dirty_background_ratio

You can find more information about these settings at this link

RTOSkit
  • 395
  • 1
  • 4
  • 11
1

I found following link with similar discussion:

0005972: Top and uptime displays wrong load average value - CentOS Bug Tracker

at last post it says:

The high load average issue is resolved in a newer version of the hpvsa driver (1.2.4-7) that is now released by HP. Contact HP Support to obtain a copy of the new driver.

alexus
  • 12,342
  • 27
  • 115
  • 173
  • 2
    Your suggestion is a long shot, OP never hinted that he or she's on HP hardware. I'd rather refer to this: http://serverfault.com/questions/341123/flush-processes-consume-too-much-of-cpu – fuero Jan 28 '13 at 18:37
  • it is, but i have similar issue and i use HP as well.. – alexus Jan 28 '13 at 18:38
0

Do you have a EnableMMAP Off in your Apache configuration file?

If you memory-map a file located on an NFS-mounted filesystem and a process on another NFS client machine deletes or truncates the file, your process may get a bus error the next time it tries to access the mapped file content.

For installations where either of these factors applies, you should use EnableMMAP off to disable the memory-mapping of delivered files.

I'm not sure whether these are the symptoms, but it's worth a try

klocek
  • 562
  • 5
  • 11
0

If you have an ext4 filesystem, check this bug Slow writes to ext4 partition - INFO: task flush-253:7:2137 blocked for more than 120 seconds. which has been fixed in recent kernels RHSA-2011-1530 which you can also obtain, of course, from Centos.

ramruma
  • 2,730
  • 1
  • 14
  • 8