0

I'm running a web-server using AWS (Ubuntu 18, Apache2, Django, with https cert from Let's Encrypt - which auto renews) at vanlevy.com.

I am the only one who has access to the AWS account and the site's admin pages. Pretty much the only folks (besides bots) who use this site are me and my friends (although there's no real reason why more people couldn't use it).

Between the last successful use and the discovery of the problem, I had not touched the site (either by using it or performing maintenance, or updating it). Now, the site only gives 500 errors; the error logs give "AH00051 child pid ##### exit signal Segmentation fault (11), possible coredump in /etc/apache2" (where the ##### is an increasing number)

Hunting through the syslogs, I find that the first kernal message that references the segfault is when I tried to use the site and found it unresponsive. From this overflow Q/A and this one, it looks like there's some sort of shared library problem. The kernal error message I'm getting is:

kernel: [15207113.546701] apache2[2133]: segfault at 3f0 ip 00007f78ade65cab sp 00007f78b98e85d0 error 4 in libssl-c0c2ede4.so.1.0.2q[7f78ade3c000+6c000]

Since I made no changes before the segfaults, my options for culprits are someone did something malicious (which seems just really unlikely), or an automated process went awry.

Questions:

  1. Could CRON have caused this? A couple of minutes before the segfault, a CRON was running.* OTOH, this same CRON ran zillions of times before that. This CRON job also happens to correspond to the time of the last successful page served. (Thanks, applebot?)
  2. Can I salvage this virtual machine? Is there a fix I can do to sort out the libraries? I've tried updating all of the requirements; I have yet to update Django, that's next.
  3. If there's no salvage, how do I prevent this? I'll add that I do not consider myself a 'real' sysadmin - this is my first ever 'from scratch' website. My coding skills are: able to follow directions, mostly.

(*) Related CRON messages

CRON[7017]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
systemd[1]: Started Clean php session files.
CRON[7101]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)

TIA

Van
  • 101
  • 4
  • Have you tried rebooting? Are you out of disk space, RAM, etc.? – Zenexer Aug 26 '19 at 04:34
  • Using AWS's console, I have (now) tried to reboot; and I am under my limits. However, now I can't even putty into it. Guess it's time to talk to AWS. – Van Aug 26 '19 at 11:52
  • AWS probably won’t be able to help you; it sounds like you’re out of disk space. You won’t be able to determine that from the console. You’ll need to create an new machine, detach the EBS volume from the old one, attach it to the new one, and delete some files. You’ll probably want to look in /tmp/ and /var/log/. – Zenexer Aug 26 '19 at 13:47
  • I fixed the putty (which was a derp on my part). My server is using only 29% of memory and 55% of the space. Poking around, I've found: https://stackoverflow.com/questions/11092089/django-wsgi-application-segfault the 1st answer suggests this might be a library conflict. So, now I guess I'm hunting in the syslog to find out an update that could have caused the error? – Van Aug 26 '19 at 15:11

0 Answers0