4

I'm running something of a bare-bones server (based on Ubuntu 11.04) on an Amazon EC2 micro instance, whose purpose is simply to coordinate the activities of a few webservers. The machine ran well for a few weeks, but now is hanging frequently with its CPU redlined at 100%.

I logged into the machine over SSH and ran a top, which revealed that the landscape-sysinfo process was the perpetrator consuming all of the system resources. A pstree revealed where it was situated:

init─┬─atd
     ├─cron
     ├─dhclient3
     ├─dovecot─┬─2*[dovecot-auth]
     │         ├─3*[imap-login]
     │         └─3*[pop3-login]
     ├─6*[getty]
     ├─master─┬─pickup
     │        └─qmgr
     ├─mountall
     ├─mysqld───11*[{mysqld}]
     ├─rsyslogd───3*[{rsyslogd}]
     ├─sshd─┬─sshd───sshd───bash
     │      ├─sshd───sshd───bash───top
     │      ├─sshd───sshd───bash───pstree
     │      └─sshd───sh───run-parts───50-landscape-sy───landscape-sys+
     ├─udevd───2*[udevd]
     ├─upstart-socket-
     ├─upstart-udev-br
     └─vsftpd

The offending process is listed here as the last child of sshd. If I manually kill landscape-sysinfo, the machine returns to normal - until the process spontaneously respawns, usually a few moments later. (I can "vouch for" the other sshd processes in the above tree. They were legitimate.)

I have no idea why landscape-sysinfo is spawning itself randomly. I doubly have no idea why it's the child of sshd.

I'm obviously none too thrilled about having an SSH processes running on my machine that I can't account for. Initially I feared a breach/trojan/backdoor, so I ran chkrootkit and rkhunter, but they both came up clean.

Does anybody have any idea what could be causing this process to run wild? Any thoughts on how to stop it from respawning?

Chris Allen Lane
  • 333
  • 3
  • 12
  • 1
    You can run `strace landscape-sysinfo` to see what syscalls, thigh could be helpful. In my case most time consuming op was making df on home dir. `landscape-sysinfo --exclude-sysinfo-plugins=Disk` does not hang – Volodymyr Boiko Jan 06 '22 at 13:11

2 Answers2

6

I figured out the actual cause of the problem a while back, and figured I should document it here for the sake of others who may have similar issues. The root cause turned out to be trickier and more complicated than I initially expected.

In short, run-parts was working fine all along. Its going haywire was just the symptom of a different problem. The failure-chain looked something like this:

1) On an entirely different machine, lsyncd (a file-syncing utility based off of rsync) was running haywire for reasons beyond our concern here. Of our concern, though, is that lsyncd was trying to sync files against this micro-instance (which manifested the problems) over SSH.

2) Because lsyncd was making dozens of simultaneous connections over SSH, each was seemingly being greeted with the SSH login banner landscape-sysinfo Ubuntu provides by default. This explains what landscape-sysinfo is and why it is a child of SSH. It appeared that run-parts was the culprit, but in fact the issue was that the machine was being bombarded with SSH connections.

3) Exacerbating the issue was that this is a micro-instance on EC2, and I've since discovered that Amazon severely throttles micro-instances whose CPU consumption steadily rides above a certain threshold. (For an excellent explanation of the details, please see Greg's Ramblings. Many thanks to Greg for that article!)

Thus, the machine ran slowly for a few moments while it was being bombarded SSH connections, and then became unusably slow after the throttling kicked in.

Mystery solved!

Chris Allen Lane
  • 333
  • 3
  • 12
  • Hi Chris Allen Lane, unfortunately Greg's Ramblings article link not longer working, what is the solution for this issue? Thanks. – Jerry Chong Jun 22 '22 at 07:26
2

It is a regularly scheduled cron job that gathers performance data.

Look here for (light) removal instructions. To just remove it entirely if you don't care about the data collection, either remove the package (if it will let you) or just find the crontab entry for it and comment it out.

Allen
  • 1,315
  • 7
  • 12
  • The instructions provided in the linked blog didn't solve the problem, but simply deleting /etc/update-motd.d/50-landscape-sysinfo did, and that's good enough for me. Thanks for the help! – Chris Allen Lane Aug 23 '11 at 21:15
  • In hindsight, this didn't actually solve the problem, though it masked a symptom briefly. See the new accepted answer below. Regardless, thanks for the help, @Allen! – Chris Allen Lane Nov 16 '11 at 22:21