0

After working for a while, my filesystem (EXT4) becomes read-only. I then use my live USB to boot into live mode and I run fsck on the corrupted partition (and others too, to be safe). I run fsck -y and it does fix all errors on the problematic partition. When I run fsck, again, all partitions are reported as clean.

Then I reboot normally (not live USB) into my system; I run a few touch abc commands at different locations, to test and it is able to write to disk. After a while however, it again becomes read-only.

I've repeated this entire process 4-5 times (fsck-from-live-usb --> boot-normally --> becomes-read-only --> fsck-from-live-usb), and I don't know the cause of this problem.

dmesg shows the following kind of errors:

blk_update_request: I/O error, dev sdb, sector 2521582056

tag#28 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK

Is there a way to fix this? I'm unable to work on my system. It doesn't look like a hardware problem, since fsck fixes everything and smartctl also reports the drive to be okay, no errors.

Thanks.

sanjeev mk
  • 101
  • 1
  • 1
  • 1
    Can you post the output of the smartctl -a /dev/xxx to a pastebin site and link here? – Miuku Jun 17 '18 at 15:29
  • @Miuku https://pastebin.com/LKjEvWQ9 – sanjeev mk Jun 17 '18 at 16:17
  • Your SMART output looks fine. I would replace the SATA cable as the first thing. Then move the drive to another SATA port to make sure neither the cable or the port are somehow causing issues. – Miuku Jun 17 '18 at 16:23
  • [As we have said before](https://serverfault.com/questions/222985/looking-for-hard-drive-health-monitoring-software/223043#223043), one of the classic large-scale investigations into smartctl results concluded that if smartctl says your dirve is failing, it probably is; but if it says it's fine, you can conclude nothing. That is, it's a good predictor of failure, but a poor predictor of longevity. If you suspect this drive - and I would if I were you - replace it immediately. Your data are too valuable. – MadHatter Jun 18 '18 at 06:38

2 Answers2

2

While SMART reports all OK, the disk may be bad anyway, you should try to:

  • perform a SMART test with smartctl -t long /dev/sdb, see for example the Arch wiki
  • check the disk for badblocks with badblocks -s, for other ways to do it (some destructive) see (again) the Arch wiki

It may also be a problem with the SATA controller or with the bus, but first you should check the disk (maybe from another machine if you are not sure about the controller).

Enrico Polesel
  • 193
  • 1
  • 9
1

This is most likely a bad block. While my company has a rule that any such hard drive should be discarded immediately, we, home users, often try to salvage as much as we can. The tool of choice is HDD Regenerator (non-destructive), but it's paid software. If you want to do it for free, you can use HDD Low Level Format. Old versions are free. This will require a complete backup and restore. The programs I mentioned work independently of the filesystem. HDD LLF runs on WinXP or 2003 directly, whereas HDD Regenerator creates a bootable USB drive, but is also available online as a Linux initrd floppy image used with memdisk.

Zdenek
  • 240
  • 1
  • 4
  • 1
    Can I ask why the downvote? I have used the solution I proposed successfully in the past. – Zdenek Jun 18 '18 at 16:55