3

This server has RAID-1 configured. It get readonly file system everyday. if i reboot it, it gets back with a read-write file system. but after a short while, readonly again...

Any idea please? Thanks.

# dmesg |grep error
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
VFS: cannot write quota structure on device cciss/c0d0p8 (error -30). Quota may get out of sync!
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927230 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927273 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927333 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927712 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929238 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929464 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929704 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929805 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71930367 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71931281 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927230 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927273 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927333 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927712 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929238 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929464 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929704 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71929805 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71930367 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71931281 in dir #71927229
EXT3-fs error (device cciss/c0d0p8): ext3_lookup: unlinked inode 71927230 in dir #71927229
ewwhite
  • 194,921
  • 91
  • 434
  • 799
Jasper
  • 63
  • 1
  • 8
  • This is likely a hardware problem since it occurs frequently. The advice to run a `fsck` makes sense, but is only treating the symptom, not the underlying problem. – ewwhite Sep 29 '11 at 10:47

2 Answers2

5

This is a cciss controller, so the server is probably an HP ProLiant system. I would suspect an issue with the drive array in the form of a failed or failing disk. In addition to the normal Linux-level disk check (fsck), try to see if you can get any information on the drive array's health.

Do you have physical access to the server? Can you see any error lights on the drives?

Which Linux distribution is this?

If you have root access give us the output of cat /proc/driver/cciss/cciss0. Check to see if the HP management agents are installed. Try hplog -v to print the system's IML log to check for error messages. If you have the hpacucli utility installed, you may be able to get the specifics of the array's health with hpacucli ctrl all show config detail.

If none of those HP utilities are installed, there are other ways to get basic array info. You could install the HP Management Agents for your distribution or check this utility to get quick array status.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • yes, it's hp raid card. i asked the tech of the dc to check if there is any problem of the disk or raid card. and they said the disks and card were all ok. – Jasper Sep 29 '11 at 10:51
  • cciss0: HP Smart Array P410 Controller Board ID: 0x3243103c Firmware Version: 3.00 IRQ: 50 Logical drives: 1 Sector size: 8192 Current Q depth: 0 Current # commands on controller: 4 Max Q depth since init: 28 Max # commands on controller since init: 28 Max SG entries since init: 31 Sequential access devices: 0 cciss/c0d0: 500.07GB RAID 1(1+0) – Jasper Sep 29 '11 at 10:52
  • cciss_vol_status -q /dev/cciss/c0d0p8 /dev/cciss/c0d0p8: (Smart Array P410) RAID 1 Volume 0 status: OK. – Jasper Sep 29 '11 at 10:58
  • The drive size indicates that these may be SATA disk drives, so there's a slight chance a drive is starting to fail. You could have the tech run the HP SmartStart diagnostic CD. But if it keeps happening, I'd look at hardware. – ewwhite Sep 29 '11 at 11:04
4

Looks like your disk needs a cleanup. You should force a fsck on it to clean up all these errors before it craps out on you totally.

There are a lot of switches available with fsck, but to get you started you can do one of the following:

This will check all mounts in your /etc/fstab file:

fsck -A

This will check the particular disk that is throwing those warnings:

fsck -t ext3 /dev/<device name>

You should be aware that an fsck can take a LONG time so this is not something you want to do in the middle of the day on a production server.

jdw
  • 3,735
  • 1
  • 17
  • 20
  • when i reboot the server, fsck runs automatically. /home is the partition on which i got read only file system. the other partitions work just fine. i dont quite sure if fsck will work this problem out. thanks for your suggestion. :-) – Jasper Sep 29 '11 at 10:46
  • fsck 1.39 (29-May-2006) e2fsck 1.39 (29-May-2006) /dev/cciss/c0d0p8 is mounted. WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage. Do you really want to continue (y/n)? yes /home1: recovering journal /home1: clean, 3450103/80642048 files, 19270485/80620186 blocks – Jasper Sep 29 '11 at 10:53
  • 1
    As the message states, you should really run this on an unmounted file system. This allows fsck to work better because there are no open files on the volume. – jdw Sep 29 '11 at 11:47
  • I cannot overstress what jdw said: you're asking for trouble if you run e2fsck on a mounted file system in any kind of write mode. Moreover, what you've done above is just to replay the journal, not actually to check the file system. Take the box to single-user mode, unmount /home1, and use `e2fsck -f` to force a full check of the file system. – MadHatter Sep 29 '11 at 12:47