0

I've got a CentOS 5.5 server (HP ProLiant with two-disk RAID array) that was working fine until a power failure last week. (Long story, but UPS was not properly configured at that time.) After the power failure, the server came back online and worked for a day or two but got progressively slower on web hits and then I couldn't log in via SSH. User at the console (server is 4,000+ miles from my present location) couldn't log in either. Was getting worried about hardware problems, so I've had some local help get it booted from a system rescue CD.

e2fsck needed to do some journal recovery, but otherwise things initially checked out. Did a proper reboot, and system came up without any serious red flags. (Unfortunately the guy I've got at the console isn't great at spotting what could be abnormal, but nothing sprang out as a warning or error.) When he tries to log in at the console, it takes the username but as soon as he starts typing the password, he gets "type=1100 audit(1291752714.120:13)", followed by what he describes as nonsense (I know, I know, I probably need him to give it to me verbatim), ending with "ext3_abort called" and "Remounting filesystem read-only".

I figure, OK, maybe there's something that the initial fsck didn't find, so let's do bad blocks scan. Rebooted to rescue CD and did e2fsck -c on all the partitions last night and no bad blocks were reported. I'm now running the non-destructive read-write check, but due to the partition sizes, I don't think this is going to be a very effective use of time. When I check the logs from the boot from hard drive where logging in wasn't possible, there's nothing about drive issues at all, which perplexes me.

Logs from prior to the start of issues last week indicate that there were some probes against the server, so a compromise of some sort is at the front of my mind. I'm game for doing a clean install remotely, but I thought I'd see if anyone had any ideas why a boot from the hard drive would suggest disk issues but fscking from the rescue CD doesn't suggest any issues. Anyone seen this behavior before? Further things I should do to check for hardware issues before spending time on re-install?

Thanks.

2 Answers2

1

e2fsck simply does not fix all problems. I have a linux VM which has a filesystem error where it shows some files but doesnt allow me to delete them. If I e2fsck the drive e2fsck goes into an endless loop and never finishes. Sometimes the easiest way is to just copy the data off, re-mke2fs and start again...

Chris T
  • 76
  • 1
0

It might be interesting to run rpm -Va to compare checksums of your installed packages. (From the rescue disk, use --root as necessary.)

mattdm
  • 6,550
  • 1
  • 25
  • 48