Ubuntu Forced fsck on boot fails

fsck has lots of errors reading block 24251xx (attempt to read block from filesystem resulted in short read) while getting next inode from scan. Ignore error (y)?

Force rewrite (y)?

Sometimes there is other output mixed in:

[8222.00061] ata1.00: exception Emask 0x0 .bunch of hex.frozen [8222.00124] ata1.00 cmd ..bunch of hex.. in [8222.00264] res ..bunch of hex..(timeout) [8222.00124] ata1.00: status: {DRDY}

What is going on and what should i do?

Update: I found out about the second part. It is a Libata error message from the kernal indicating that the drive wasn't responding to a command in time. This piece of advice will supposedly help with that [i'll let you know how it goes once i get past fsck]

In particular, timeouts may be solved by acpi=off or 'noapic' or pci=nomsi or pci=biosirq.

srboisvert

Posted 2009-08-22T20:20:30.657

Reputation: 139

did you change anything at your hardware (add any devices, even USB or firewire)? If not, and your system installed fine without pci boot parameters, I don't think you need them now... – mihi – 2009-08-22T21:06:08.257

2and, don't try to "get past fsck" if you have important data on the disk, try to copy that one first (mount the drive read-only); if the disk is really failing, it might not survive long enough to do a full fsck... – mihi – 2009-08-22T21:07:14.777

Answers

Most likely your hard disk is dying...

Get a live CD (Ubuntu Install/Live CD is fine).

If you have any important data on that disk without backup, mount your disk read-only and copy all you can off the disk.

Then, try to make an image of the partition with dd or dd_rescue, either onto another partition or as a file to somewhere else (if you don't have space, make the image to /dev/null), so that you can see if there is any physical damage to your disk.

If there are media errors while copying the file with dd, fsck your new copy (either mount loopback or use a real partition, if you used /dev/null you have to start over with a real disk) and copy all data off you can still copy. Then try to investigate the manufacturer of your disk and whether it still has warranty. If yes, proceed with the test tools of the manufacturer... If not, check with SMART tools if there are any reallocatable sectors left and if yes try to write zeroes into the broken sectors with dd (which will reallocate them). If you don't have any luck, you will have to try to partition around the broken area; or use the -c option for mkfs.ext[23].

If there are no media errors, you will have to reformat the disk and copy back the data again. Usually ext3 (I guess it is...) is a lot more robust than other filesystems, so I don't really think this can be a filesystem error alone...

mihi

Posted 2009-08-22T20:20:30.657

Reputation: 3 217

ext3 is perfectly fine with badblocks. See mkfs.ext3's -c option, or e2fsck's -c option. Further, the disk can only reallocate sectors on write, so its perfectly normal to hit unreallocated bad blocks, and this doesn't mean the disk is dying. – derobert – 2009-08-23T09:17:42.000

thanks, updated the part about badblocks, and rephrased the text about reallocation (I know they are only reallocated on write but maybe I did not make this clear enough in my previous writing). – mihi – 2009-08-23T19:38:13.757

Yeah. Dying disk. Saved my data and getting a replacement drive. Never could even finish a badblocks scan it was so bad. – srboisvert – 2009-09-04T13:43:20.640

Sounds like you have a bad sector if the numbers are consistent across attempts to fsck. Unfortunately, you're going to lose whichever file was stored in that inode.

Check smart status, it'll usually tell you how many bad blocks the disk knows about. Hopefully, its only a few. If it tells you the disk is failing, I hope you have a backup.

Running fsck -c /dev/WHATEVER should run a bad-block scan, and then tell you what you've lost (or need to restore from a backup).

derobert

Posted 2009-08-22T20:20:30.657

Reputation: 3 366