3

Recently, we have been noticing CPU spikes on our production environment caused by redis which can be seen below:

enter image description here

To combat this issue, I have been restarting the redis server about twice a day :( which is obviously far from ideal. I'd like to identify the root cause.

Here are some things I have looked into so far:
1) Look into any anomalies in the redis log file. The following seems suspicious:

enter image description here

2) Researched nginx access logs to see if we are experiencing unusually high traffic. The answer is no.

3) New Relic revealed that the issue started on Nov 21st, 16` (about a month ago) but no code was released around that time.

Here are some details about our setup:

Redis server: Redis server v=2.8.17 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=64a9cf396cbcc4c7

PHP: 5.3.27 with fpm

Redis configuration:

daemonize yes
pidfile /var/run/redis/redis.pid
port 6379
timeout 0
tcp-keepalive 0
loglevel notice
logfile /var/log/redis/redis.log
syslog-enabled yes
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbcompression yes
rdbchecksum yes
dbfilename redis.rdb
dir /var/lib/redis/
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
maxmemory 15GB
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-max-len 128
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
include /etc/redis/conf.d/local.conf

Framework: Magento 1.7.2 with Cm_Cache_Backend_Redis

Please let me know if given the above information there is anything I can do to mitigate the high cpu usage.

dipole_moment
  • 83
  • 1
  • 7
  • I just realized: the problem looks like it is with the `yam` command. Any idea what that is? The redis server process is typically named, `redis-server`. – 2ps Dec 15 '16 at 14:19
  • Yikes! the only other reference I could find to yam was http://stackoverflow.com/questions/37897728/aws-unnecessary-script – 2ps Dec 15 '16 at 14:25
  • Alright, yam looks like it is for yum/apt mirror. https://www.rpmfind.net/linux/rpm2html/search.php?query=yam That said, definitely check for the security breach because it looks like your redis is accessible to the world without authentication. – 2ps Dec 15 '16 at 14:36
  • Hey @2ps thanks so much for the details, it is very helpful. I have noticed that the command column on top for the redis process is sometimes ``yam`` and sometimes ``redis-server``. I am wondering if you have any input as far as how exactly this ``yam`` command gets triggered. How would I get down to the bottom of this? Our server uses ssh keys for user login but I just confirmed our redis is accessible to the outside world by simply specifying the host. YIKES. That being said, how would someone with access to redis be able to configure it to run YAM? – dipole_moment Dec 15 '16 at 15:59
  • Check `/opt/yam/yam` to see if it exists. If it does, you are likely compromised. Also check `/root/.ssh/authorized_keys` and make sure only SSH keys that you know about are there. As for the vector of compromise, here is a proposed hack `http://antirez.com/news/96` that can be used to download a script to your computer and run it periodically. You’ll also want to check each user's crontab and the global crontab to make sure yam does not appear there. – 2ps Dec 15 '16 at 16:10
  • No file at ``/opt/yam/yam`` and no unauthorized keys. I am going to block outside connection using ``ufw`` and monitor for improvements. – dipole_moment Dec 15 '16 at 16:45
  • As an aside, another version of this hack, instead of hacking your server, instead used it to mine bitcoin. This would jive with high CPU utilization. – 2ps Dec 15 '16 at 18:09
  • Very interesting. I didn't think about that scenario. – dipole_moment Dec 15 '16 at 18:16

1 Answers1

3

VERY IMPORTANT UPDATE:

Your server may have been hacked. It’s not redis that is causing the high CPU usage, but a separate command called yam (take a look at the far right of your htop, I missed it the first time). The yam command is used in a well-known exploit of redis and often results in high CPU usage. You’ll want to double-check to make sure your server is secure.

Here are some articles and links you can refer to if you want to learn more about the vulnerability and how to secure yourself:


Here is my checklist for magento/redis, er, performance issues:

  1. Make sure you are on a newish version of redis, like 3.2, I personally prefer redis32u from the IUS repository if on CentOS.
  2. Check the size of your redis database, it should be in /var/lib/redis, and make sure it is relatively small.
  3. Verify that you have have enough ram for redis. You’ve specified a maxmemory of 15GB, which is really overkill for magento. I typically use something closer to 256mb. If you are using redis that much (!!!!!!), you likely have other problems in your magento stack.
  4. Make sure you have vm overcommit setting set in syscntl. https://redis.io/topics/admin (see this link for more details on what you need)
  5. Make sure you have sufficient open file limits to handle the number of connections to redis.

Generally speaking, the log file isn’t suspicious because your redis save settings tell redis to save every minute if there have been > 10000 write, every five minutes if there have been > 10 writes, and every 15 minutes if there have been > 1 write. So it is essentially persisting the info back to disk every minute, which shouldn’t be that burdensome.

2ps
  • 1,076
  • 8
  • 11
  • Do you have any articles about that yam exploit ? – Tolsadus Dec 21 '16 at 06:43
  • https://www.riskbasedsecurity.com/2016/07/redis-over-6000-installations-compromised/ ; http://antirez.com/news/96 ; https://news.ycombinator.com/item?id=13053647 ; https://gist.github.com/lokielse/d4e62ae1bb2d5da50ec04aadccc6edf1 – 2ps Dec 21 '16 at 06:44
  • @Tolsadus: FYI—I added some additional links if you wanted more info. – 2ps Dec 21 '16 at 06:53