5

There is a problem that is driving me crazy. After the day I tried to fix the CPU usage problem of my VPS, the CPU load has grown from 60% to 150%, and I have no idea what causes the problem. Please help me.

I had installed a copy of mediawiki on a Linode 1024. The wiki is running on Niginx + PHP-fpm + MySql. The wiki doesn't have much traffic, only around 4000 requests/day, mostly from Google and Bing bots. It had been using around 60% (of total 400% on the Linode) of the CPU before. I thought it was a bit high, so two day ago, I was trying to fix the problem (not knowing what was waiting for me). I did nothing but added a new empty line to wiki's configure file, which would change the modified time of the configure file, and then all the cached page files would be set invalidated. I had done that before, and that would cause high CPU usage, but normally it would take only hours to let things back to normal again. Not this time, my CPU usage has been around 150% for more than two days.

It is php-fpm using most of CPU reassures. Using 100% of three cores is not rare. I hadn't seen that before. There are other sites on the Linode, but it should be the wiki. Because if I offline the wiki, CPU usage will drop back to around 40% soon.

The day I also duplicated php-fpm.conf, and opened it, but didn't changed it.

I have no idea what I did wrong. I here ask for help to save myself from being crazy!!! It is php-fpm. Is there a way to find out what is it doing? I mean like which scripts are related and what function codes are running?

top:

top - 06:34:33 up 10 days,  4:23,  2 users,  load average: 1.10, 1.24, 1.37
Tasks:  76 total,   4 running,  72 sleeping,   0 stopped,   0 zombie
Cpu(s): 61.1%us,  3.1%sy,  0.0%ni, 32.8%id,  2.9%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:   1028684k total,   945192k used,    83492k free,    89580k buffers
Swap:   524284k total,    18084k used,   506200k free,   530380k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                 
26721 www-data  20   0  208m  54m  34m R   99  5.4   0:09.07 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
26592 www-data  20   0  207m  45m  26m R   91  4.5   0:12.77 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
26706 www-data  20   0  196m  43m  34m S   47  4.3   0:15.19 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
26583 www-data  20   0  197m  45m  35m S   33  4.5   0:19.08 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
26787 www-data  20   0  206m  36m  18m R   25  3.7   0:00.41 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
26661 www-data  20   0  207m  46m  26m S   13  4.6   0:19.87 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
 1971 mysql     20   0  155m  57m 3952 S    8  5.7 383:57.81 /usr/sbin/mysqld                                                                                        
  242 root      20   0     0    0    0 S    1  0.0   0:51.36 [kworker/3:1]                                                                                           
 5711 root      20   0  139m  95m  580 S    1  9.5   0:41.30 /usr/local/bin/memcached -d -u root -m 128 -p 11211                                                     
19463 root      20   0  190m 3984 1284 S    1  0.4   0:02.66 /opt/php5/sbin/php-fpm --fpm-config /opt/php5/etc/php-fpm.conf                                          
29100 www-data  20   0 10928 5540  820 S    1  0.5   4:49.05 nginx: worker process

vmstat 30

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0  16912  81456  90784 554172    0    0     4     6    3    2 11  1 87  1
 0  0  16912  78036  91000 555356    0    0    38    34 1397  375 12  1 87  0
 4  0  16912  31776  91528 557508    0    0    78    42 3197  487 45  1 52  1
 1  0  16912  83356  91768 558576    0    0    35    56 2608  449 32  1 67  1
 1  0  16912  81548  92040 559720    0    0    41    31 1243  432  8  1 91  1
 2  0  16912  53056  92332 562744    0    0   105    33 2013  581 17  1 81  1
 2  0  16912  73236  92552 564844    0    0    68    36 1968  615 16  1 82  1
 0  0  16912  91612  92904 566676    0    0    69    35 1845  692 13  1 85  1
 1  0  16912  71248  93180 568428    0    0    58    33 1952  604 15  1 82  1
 1  0  16868  55952  93516 572660    1    0   144    42 1801  637 12  1 86  1
 2  0  16868  48324  94416 577844    0    0   189    66 2058  702 17  1 80  2
 1  0  16928  58644  94592 578184    0    2   160    49 2578  723 25  1 70  3
 5  0  16928  22600  94980 580568    0    0    89    32 1496  361 13  0 85  1
 0  0  16988  49256  94500 576396    0    2    41    37 1601  426 14  1 85  0
 5  0  18084  24336  86032 502748    0   37    83    68 2989  562 42  1 56  0
 1  0  18084 123604  86376 506996    0    0   118    41 2201  573 22  1 76  1
 2  0  18084 126984  86752 508876    0    0    64    53 1620  490 13  1 85  1
 2  0  18084 103104  87148 510768    0    0    71    37 2757  602 33  1 64  1
yang
  • 151
  • 1
  • 1
  • 4

4 Answers4

5

Try using strace -p 26721 to see what the process is doing. To see which file handle is which use lsof -p 26721.

Jörg Ludwig
  • 121
  • 6
  • 1
    Strace returns lots of information, but I have no clue to figure out what is wrong. lsof returns a file list, most of the files are related to php-fpm itself, not the php scripts are running. – yang Jun 12 '12 at 04:58
  • Can you give the strace output with the options I specified, i.e. -ffttT and then -o to capture in a file. Pastebin it. – Soham Chakraborty Oct 02 '12 at 18:23
4

You could start by reducing Google and Bing requests... Have you set up a robots.txt? Exclude the ScriptPath from crawling, so that only the canonical page URLs are indexed, but not any URL with parameters to api.php or index.php. Also set a crawl-delay if needed.

Many MediaWiki sites are slowed down by search engines due to small configuration errors.

Nemo
  • 259
  • 1
  • 13
  • Welcome to serverfault, but the good starting point is to identify the root cause of cpu load. If crawler http requests were clearly identified, your answer would be good. I have to downvote it :( –  Dec 19 '13 at 12:28
  • Ok Eric, your guest. However, if the causes were clearly identified there would be no question: the problem would already be solved. Based on field experience, many MediaWiki sites are slowed down by search engines due to small silly configuration errors. The very fact that robots.txt is not mentioned in the question shows that it's something the original poster needs to think about, so I think my answer relevant. – Nemo Jan 17 '14 at 07:03
  • edited and upvoted –  Jan 17 '14 at 10:21
2

In my case, after running top, pid (3174) of top consuming CPU (user time was 11.8%us) was identified:

top - 12:43:19 up  1:35,  2 users,  load average: 0.97, 1.05, 1.11
Tasks: 265 total,   2 running, 263 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.8%us,  0.4%sy,  0.0%ni, 87.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16333780k total, 11207464k used,  5126316k free,     6464k buffers
Swap:  2097144k total,        0k used,  2097144k free, 10078524k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3174 nginx     20   0  705m  49m  23m S 33.5  0.3   1:12.21 php-fpm
 3173 nginx     20   0  711m  56m  25m S 28.5  0.4   1:08.62 php-fpm
 3187 nginx     20   0  711m  49m  17m S 26.2  0.3   1:00.80 php-fpm
 3191 nginx     20   0  717m  59m  21m R  6.6  0.4   1:04.29 php-fpm

The following command was run to identify the factor that was causing glitches:

strace -ffttTo /tmp/strace.out -p 3174

The file /tmp/strace.out.3174 was full of following lines

12:55:17.417008 open("/var/log/php-fpm/www-error.log", O_WRONLY|O_CREAT|O_APPEND, 0644) = -1 EACCES (Permission denied) <0.000024>
12:55:17.417120 write(2, "NOTICE: PHP message: PHP Depreca"..., 186) = 186 <0.000016>

After running:

chown -R nginx:root /var/log/php-fpm

the problem was gone.

Henrik Pingel
  • 8,676
  • 2
  • 24
  • 38
user622122
  • 21
  • 1
0

CPU is being spent in %user state, try to find more information about the PIDs consuming most of the resource in top output. Like as mnentioned, run strace -ffttTo /tmp/strace.out -p on it to know where it is stuck or how much time it takes for successive system calls. If there is a delay, you can identify it from there.

Also, lsof, fuser are viable options.

Soham Chakraborty
  • 3,534
  • 16
  • 24