6

One of our servers recently experienced some file system corruption and our root file system was automatically remounted as read only. The steps I took to recover were:

  1. attempted to remount > mount -n -o remount / this failed
  2. rebooted the server
  3. was prompted to perform a manual fsck, there were 5 orphaned inodes which required fixing.

After performing these steps I was able to gain access and the file system was writable again. Unfortunately I don't have any informative logs as none were written or I would have included these.

One cause that was been suggested was that our database was too busy to write the data to disk properly and this caused the issue, the high level of cache memory was given as an indication that this might be the case. However I'm not sure about this as although cache is high we aren't using the swap at all (output of free below).

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2041       1879        162          0         62       1599
-/+ buffers/cache:        216       1825
Swap:          471          0        471

Is there any way I can diagnose the fault after it has happened? Does MySQL look like a likely candidate?

If not are there any steps I should take in the future if this happens again?

BenM
  • 748
  • 1
  • 8
  • 13

2 Answers2

6

Orphaned inodes are benign and perfectly normal whenever you have an unclean dismount. They are simply files that had been deleted, but were still open when the fs was remounted read only. They are not the cause, but merely a symptom. You need to check your kernel logs to see what the actual problem was that caused the read only remount. You also might want to run some SMART diagnostics to make sure the drive isn't failing.

psusi
  • 3,247
  • 1
  • 16
  • 9
3

First sanity check your server:

  • Are you using ECC memory?
  • Are you running RAID? Did you see any RAID cards errors? (dmesg would have shown these at the time, but now you've rebooted they're probably lost)

A high level of cache is desirable and shouldn't corrupt your file-system in any way.

SamB
  • 81
  • 1
  • Yes we are using ECC. We aren't running RAID so didn't see any errors. In retrospect I should have taken some screen shots of what I could see but unfortunately I was busy trying to get the server back up and running and was hoping that I'd be able to look at the locks afterwards, unfortunately none were written. – BenM Feb 16 '10 at 15:36