3

Lately, our server face problem with the memory. Started from few weeks ago, our server was loading very slow. Accessing emails and websites were really take too long. Then we asked the server technical support to reboot the server for us. After rebooting, things came back to normal. We thought that it was time for us to upgrade the RAM. Originally we only have 2GB RAM on our server, therefore we upgraded it into 8GB.

We thought the problem was solved. However, after upgrading the RAM, the memory usage was keep getting higher and higher. It was like always reach its max usage (ex. 456.5 MB free out of 7.8 GB memory). The next day, the server was totally down, and we had to ask the technical support to reboot the server for us again. Every time we face the problem, we have to ask the supporting staff to reboot the server for us. According to the support staff, if they of the SQL access, the memory will turn back to normal. But if they on it again, the memory will getting higher again. So, they suspect the problem is with the SQL query.

I want to ask, is it really the SQL query that give the problem? Or it was the hardware problem? If it's the SQL, how do I know what kind of queries that have taken up such a huge memory? How can I check it? Our server is running on the details below:

OS: Linux 2.6.18
CPU: GenuineIntel, Intel(R)Xeon(R)CPU W3550 @ 3.07GHz
Average Load: 808.18;674.21;587.18
database records: 421,031 with 58 tables

Query statistics that I can provide from the PHPMyAdmin:

select  314 k   11.98 k 66.37%
set option  34 k    1,296.59    7.18%
show fields 19 k    712.00  3.94%
update  16 k    620.61  3.44%
alter table 16 k    610.32  3.38%
change db   15 k    569.08  3.15%
insert  15 k    560.20  3.10%
show variables  11 k    434.01  2.40%
show tables 9,752   371.66  2.06%
begin   7,172   273.33  1.51%

result of performing 'top':

top - 15:20:07 up 1 day,  6:13,  1 user,  load average: 497.30, 512.17, 542.15
Tasks: 7743 total,   5 running, 7738 sleeping,   0 stopped,   0 zombie
Cpu(s): 23.9%us, 50.9%sy,  0.1%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8173372k total,  8048380k used,   124992k free,  1816952k buffers
Swap:  2096376k total,      216k used,  2096160k free,   533876k cached

I hope someone can give me some advice as I have facing this problem for some time and I have no one that I can ask for guidance. Thank you in advance.

update: dmesg

CPU5: Temperature above threshold, cpu clock throttled
CPU7: Temperature/speed normal
CPU2: Temperature/speed normal
CPU6: Temperature/speed normal
CPU1: Temperature above threshold, cpu clock throttled
CPU3: Temperature/speed normal
CPU0: Temperature/speed normal
CPU4: Temperature/speed normal
CPU6: Temperature/speed normal
CPU5: Temperature/speed normal
CPU1: Temperature/speed normal
CPU2: Temperature/speed normal
CPU4: Temperature/speed normal
CPU7: Temperature/speed normal
CPU0: Temperature/speed normal
CPU3: Temperature/speed normal
brute[19592]: segfault at 0000000000000028 rip 00000000080a592f rsp 00000000ffae3a28 error 6
brute[19539]: segfault at 0000000000000028 rip 00000000080a592f rsp 00000000ffae3a28 error 6
CPU2: Temperature/speed normal
CPU4: Temperature/speed normal
CPU7: Temperature/speed normal
CPU1: Temperature/speed normal
CPU5: Temperature/speed normal
CPU3: Temperature/speed normal
CPU6: Temperature/speed normal
CPU0: Temperature/speed normal
CPU0: Temperature/speed normal
CPU7: Temperature/speed normal
CPU6: Temperature/speed normal
CPU1: Temperature/speed normal
CPU2: Temperature/speed normal
CPU3: Temperature/speed normal
CPU5: Temperature/speed normal
CPU4: Temperature/speed normal
brute[21368]: segfault at 0000000000000000 rip 0000000008048f03 rsp 00000000ffb82db0 error 4

today's TOP result:

top - 10:42:47 up 3 days,  1:35,  1 user,  load average: 4.35, 4.53, 4.59
Tasks: 187 total,   5 running, 182 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.7%us, 38.5%sy,  0.0%ni, 48.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8173372k total,  7800156k used,   373216k free,  1653768k buffers
Swap:  2096376k total,      216k used,  2096160k free,  2723732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27913 consulti  25   0 31096 3700 1172 R 100.2  0.0   1484:40 perl
27916 consulti  25   0 31096 3732 1204 R 100.2  0.0   1051:59 perl
28213 consulti  25   0 31096 3736 1204 R 100.2  0.0   1073:30 perl
28210 consulti  23   0 31096 3740 1204 S 86.9  0.0   1016:23 perl
 3205 mysql     15   0  333m  55m 5416 S 13.6  0.7 143:36.73 mysqld
14616 apache    15   0  333m  33m 6056 S  4.7  0.4   1:02.12 httpd
25104 consulti  18   0 31096 3732 1204 R  4.7  0.0 486:12.74 perl
 2702 root      15   0 12744 1148  808 R  0.3  0.0   0:00.01 top
    1 root      15   0 10352  696  588 S  0.0  0.0   0:01.62 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.07 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.07 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.02 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/3
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/3
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.05 migration/4
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/5
CnV
  • 141
  • 4
  • Could you please add the first 10 lines of `top` command and `vmstat 1 10`, including the headers? – Mircea Vutcovici Nov 14 '14 at 03:21
  • Hi,may I know which 10 lines of top command? how to check the vmstat 1 10? is this checked on the server's side? – CnV Nov 14 '14 at 03:29
  • Those are terminal commands. Connect with `ssh` or PuTTY on your server and run the commands. Edit your question and add the output. – Mircea Vutcovici Nov 14 '14 at 03:31
  • hi, I dont have the access to the root server. I only can access the admin side. – CnV Nov 14 '14 at 03:39
  • Most of the hosting companies offers console and/or SSH access. – Mircea Vutcovici Nov 14 '14 at 03:40
  • 1
    Check also if you have processes in uninterruptible sleep. Run as `root` user: `grep "^State:.*D" /proc/*/status` – Mircea Vutcovici Nov 14 '14 at 03:41
  • I just obtained the access, and now reach the command for -bash-3.2# Can you guide me what is the command that I should type into it as I never access it before. – CnV Nov 14 '14 at 03:44
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/18638/discussion-between-mircea-vutcovici-and-verlee). – Mircea Vutcovici Nov 14 '14 at 04:16
  • The load average indicates that the server is indeed overloaded. Seeing used memory going close to 100% is not in itself a sign of having too little memory. But when both free and cached are low, there is reason to believe you are short on memory. However it is a bit strange that under such circumstances it is not writing more to swap. The list of processes in noninterruptible sleep which Mircea asked for should provide some hints. Also it is worth checking the output of `dmesg` to see if it provides any clues. – kasperd Nov 14 '14 at 08:30
  • @kasperd I tried on the dmesg that you mentioned. There are few results stated there. I have updated the result on my question – CnV Nov 14 '14 at 08:52
  • The list of processes in noninterruptible sleep would still be useful to know. The messages about CPU temperature sounds like you may have some hardware issue, possibly insufficient cooling. The messages about segfault in `brute` would usually just indicate a bug in that program, however knowing that the CPUs are running hot the segfaults might be due to a hardware problem rather than a software problem. What is `brute`? It is not a program that I recognize. – kasperd Nov 14 '14 at 09:25
  • @kasperd i'm not sure as well. bcos we outsource our server. do u mind to teach me how to check the noninterruptible sleep? what command that I need to write? – CnV Nov 14 '14 at 09:28
  • You could use the command suggested by Mircea, or you could use this variant, which would provide a bit more information: `grep ' D ' /proc/[0-9]*/task/*/stat` – kasperd Nov 14 '14 at 09:34
  • i dont get any data from the command above. I tried on Mircea n urs. Am I typing the command on the wrong place? -bash-3.2# then i type in the command that you gave. – CnV Nov 14 '14 at 09:45
  • It is normal the the command does not return anything. It would be a hardware or diver issue otherwise. – Mircea Vutcovici Nov 18 '14 at 01:20
  • 4
    I am very concerned by to processes that look suspicious: `brute` and `minerd`. I would also check what those `perl` scripts are running. – Mircea Vutcovici Nov 18 '14 at 01:25
  • the server was down again yesterday. It was like the memory only can take up to 3 days only and it will down on the next day. so, it's not the sql query problem? – CnV Nov 18 '14 at 01:27
  • btw, the noninterruptible sleep that you mentioned previously, if no data is returned, does it mean that there is no any problem? – CnV Nov 18 '14 at 02:35
  • What you need is per-process memory usage. And you haven't given any detail about what services are running. It could simply be software bug that's causing runaway memory leak or a possibly a mis-configuration – hookenz Nov 19 '14 at 22:35

3 Answers3

6

First of all, like a few others in the comments, I'm very concerned about the minerd and brute processes. A little Google detective work shows that those names are associated with bitcoin mining and automated network traffic generation (as in: one node in a DDoS attack) respectively, indicating that someone else may be using your server for their own purposes.

Get that out of the way (and please do fix that quickly), and you can likely also improve things with a little database tuning. Given the analysis posted by RandomSeed that you probably have unused RAM, and are instead waiting on disk, and the note that you added RAM to the server without comment about any other changes, I suspect that your MySQL buffer pool is too small.

This comes down to Database Server Performance Tuning 101: disks are slow, memory is fast. Databases like to pre-load or buffer data and indexes in memory, in order to reduce the need to go out to the slower hard drive. MySQL has a few configuration settings that control how much data it's allowed to cache like this. If you add memory to the server, but don't change these MySQL settings, it will help a little by ensuring other (non-MySQL) processes have enough RAM to go around, but it won't help your core database performance very much; you need to also tell MySQL about the new memory you added.

So, how to do that? In the case of MySQL it's a little tricky, because MySQL supports multiple storage engines. You're probably using InnoDB, but may also have MyISAM, or even a mix of the two for different tables. Unfortunately, without knowing what tables use what storage engine, and (if there's a mix) with what query sizes and volumes, it's hard to accurately advise on how to change your MySQL configuration.

Joel Coel
  • 12,910
  • 13
  • 61
  • 99
  • You are right, there is definitely something wrong on this system, and a slow MySQL is only the symptom. [`mysqld` only uses 0.6% (!) of RAM](http://stackoverflow.com/q/26925297/1446005), where has the rest gone? – RandomSeed Nov 19 '14 at 23:29
  • From 58 tables that I have, 2 tables use InnoDB, 3 view tables, and the rest are using MyISAM. I have added few indexes into same tables yesterday, and still monitoring the server. – CnV Nov 20 '14 at 02:38
  • I included the today's TOP result in the question above. is it the first 4 apps that have cause the major effect? – CnV Nov 20 '14 at 02:49
  • the MYSQL Buffer Pool is 8MB. – CnV Nov 20 '14 at 03:01
  • Those perl scripts at the top are a big part of it. They could be doing _anything_, from a legitimate part of your httpd site to mining bitcoins to sending DDoS packets, and they have three of your cpu cores completely pegged. Does the `consulti` user ring a bell to you? Also, 8MB is way too small, but with MySQL you may be looking at the wrong setting. Post your my.cnf as part of 2nd question asking for DB tuning help, and you'll probably get better luck doing that at the DBA stack exchange site. – Joel Coel Nov 20 '14 at 14:23
4

Your problem is not RAM-bound (not anymore):

  • you have about 2.5 GB RAM "available" (124992k free + 1816952k buffers + 533876k cached)
  • your swap is hardly used (216k used)

It was probably not the case before you upgraded to 8GB, though.

It is now likely to be I/O-bound, and MySQL is probably the culprit::

  • load average is disastrously high (load average: 497.30, 512.17, 542.15)
  • 58 minutes of CPU time (TIME 57:59) vs 1 day of system uptime (running up 1 day, 6:13)

The most usual cause for a seemingly CPU intensive query is a long time in I/O wait (i.e. reading data from the disk). Such queries typically lack poper indexing and/or pull too much data.

Now you need to determine what those queries are. A good starting point is monitoring their execution time.

RandomSeed
  • 283
  • 1
  • 10
  • 1
    To add to this: they added RAM to the server, but probably didn't change MySQL's memory settings anywhere, meaning MySQL is not using the memory that it should be. Changing the MySQL config to allow it to use more RAM to keep table data and indexes pre-loaded/buffered in memory will likely fix the problem. However, as we don't even know for sure what storage engine MySQL is using (and for which tables/indexes), we don't have enough information yet to describe how to change this. – Joel Coel Nov 19 '14 at 21:50
  • In case anyone wonders where I got the "58 minutes of CPU time" from, the question has been [crossed-posted at StackOverflow](http://stackoverflow.com/q/26925297/1446005), where the full output of `top` is visible. – RandomSeed Nov 19 '14 at 23:21
0

From your description I think that your system is swapping. You probably need to decrease some memory allocations configured in /etc/my.cnf

You can see which processes are using the physical memory by running:

ps aux --sort=-rss|head -10
Mircea Vutcovici
  • 16,706
  • 4
  • 52
  • 80