How do you determine if a disk has problems using smartctl?
I have an Ubuntu 12.04 server using software RAID1, which became completely unresponsive. I rebooted, and it hung at boot with the message "/tmp is not ready or not present", so I skipped and started up a manual recovery terminal. Everything seemed fine, except my RAID resync was horribly slow. However, cat /proc/mdstat
didn't show any actual RAID failure.
I bumped up my /proc/sys/dev/raid/speed_limit_min
following the instructions here, but that didn't help too much. My 1TB array has been resyncing for 30 minutes now, but it's only 0.3% complete.
So I installed smartmontools
and checked the disks using:
sudo smartctl --all /dev/sda
sudo smartctl --all /dev/sdb
Both report a "PASSED" health, but sdb is also showing several lines like:
Error 83 occurred at disk power-on lifetime: 15147 hours
Error 82 occurred at disk power-on lifetime: 15147 hours
Error 81 occurred at disk power-on lifetime: 15147 hours
Error 80 occurred at disk power-on lifetime: 15147 hours
along with some sort of hex-dump for each.
What does this mean? Should I interpret these errors to mean my sdb disk is dying? How do I confirm this?
Edit: Also related, ever since the crash, I've now unable to SSH into the server. I can access it just fine from a physical terminal, and there doesn't seem to be any excessive load. I made sure the firewall was disabled, and I can still ping the server, but ssh myuser@myserver
results in "Connection timed out".