1

Our backup "solution" includes hooking up a USB drive to the backup server, and running a custom script that rsyncs the data on to the USB drive. However, after a while, the drive becomes read-only. Here's the output of dmesg:

[2502923.708171]  sdb: sdb1
[2502923.742767] sd 36:0:0:0: [sdb] Attached SCSI disk
[2502980.368020] kjournald starting.  Commit interval 5 seconds
[2502980.482705] EXT3 FS on sdb1, internal journal
[2502980.482705] EXT3-fs: recovery complete.
[2502980.488709] EXT3-fs: mounted filesystem with ordered data mode.
[2590744.432168] usb 1-2: USB disconnect, address 36
[2590744.432655] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.432784] end_request: I/O error, dev sdb, sector 795108447
[2590744.432857] Buffer I/O error on device sdb1, logical block 99388548
[2590744.432925] lost page write due to I/O error on sdb1
[2590744.433002] Buffer I/O error on device sdb1, logical block 99388549
[2590744.433070] lost page write due to I/O error on sdb1
[2590744.433139] Buffer I/O error on device sdb1, logical block 99388550
[2590744.433207] lost page write due to I/O error on sdb1
[2590744.433275] Buffer I/O error on device sdb1, logical block 99388551
[2590744.433343] lost page write due to I/O error on sdb1
[2590744.433410] Buffer I/O error on device sdb1, logical block 99388552
[2590744.433478] lost page write due to I/O error on sdb1
[2590744.433545] Buffer I/O error on device sdb1, logical block 99388553
[2590744.433613] lost page write due to I/O error on sdb1
[2590744.433681] Buffer I/O error on device sdb1, logical block 99388554
[2590744.433749] lost page write due to I/O error on sdb1
[2590744.433817] Buffer I/O error on device sdb1, logical block 99388555
[2590744.433884] lost page write due to I/O error on sdb1
[2590744.433953] Buffer I/O error on device sdb1, logical block 99388556
[2590744.434021] lost page write due to I/O error on sdb1
[2590744.434089] Buffer I/O error on device sdb1, logical block 99388557
[2590744.434157] lost page write due to I/O error on sdb1
[2590744.443942] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.447945] end_request: I/O error, dev sdb, sector 795108687
[2590744.452065] Aborting journal on device sdb1.
[2590744.452065] __journal_remove_journal_head: freeing b_committed_data
[2590744.452410] EXT3-fs error (device sdb1) in ext3_ordered_writepage: IO failure
[2590744.453795] __journal_remove_journal_head: freeing b_committed_data
[2590744.454481] ext3_abort called.
[2590744.454548] EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
[2590744.454697] Remounting filesystem read-only
[2590744.457033] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #11968705 offset 0
[2590776.909451] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #122881 offset 0
[2590777.637030] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #30015490 offset 0
[2590949.026134] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591121.070802] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591211.109072] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591300.269439] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591357.322837] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591418.664452] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591572.792037] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591667.952082] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591669.639597] __ratelimit: 3981 messages suppressed
[2591669.639658] Buffer I/O error on device sdb1, logical block 61014530
[2591669.639698] lost page write due to I/O error on sdb1

I'm not unmounting the drive within my script; can anyone suggest what would be causing this, so I can fix it?

Glen Solsberry
  • 1,506
  • 5
  • 28
  • 35
  • The problem isn't that it's going read only. Switching to read-only is just a symptom. The problem is that the device is disappearing (`usb 1-2: USB disconnect, address 36`) – MikeyB Feb 11 '10 at 17:58

2 Answers2

4

When that happens to me with a fixed disk, it means the disk is dying. Most likely this is what is happening here. If this is a backup drive that is repeatedly connected/disconnected/transported-between-locations, it is very possible that a shock or repeated thermal changes have resulted in a flaw. Most of these USB drives are not specially protected against drop/shock or thermal changes, they are just a standard SATA drive in a USB-to-SATA plastic housing.

My rule of thumb for disks, especially when it comes to backups, is: if there's a doubt, throw it out.

To rule out the USB infrastructure, you could run the disk extensively on another computer, which doesn't actually solve your problem since you still have to back up the computer.

David Mackintosh
  • 14,223
  • 6
  • 46
  • 77
  • 1
    +1 you should always have confidence in your backup solution and test it from time to time to make sure it works. – Frenchie Feb 12 '10 at 02:51
0

More information further to David Mackintosh above (his answer is very good). The filesystem itself has the option to tell the kernel to remount it read-only when it encounters an error.

From the mount(8) man page:

errors=continue / errors=remount-ro / errors=panic

Define the behaviour when an error is encountered. (Either ignore errors and just mark the file system erroneous and con‐ tinue, or remount the file system read-only, or panic and halt the system.) The default is set in the filesystem superblock, and can be changed using tune2fs(8).

I would warrant that if you're not mounting with errors=remount-ro then the filesystem has that set as an option (sample from my dumpe2fs below)

# dumpe2fs /dev/md0 | grep Error
dumpe2fs 1.41.3 (12-Oct-2008)
Errors behavior:          Continue

You might be able to find out what SMART thinks is wrong with the drive by running smartctl

smartctl -a /dev/<your drive>

I would agree with David, give serious consideration to replacing the drive. There won't be anything worse than having to recover all your data only to find that it's unreadable.

Frenchie
  • 1,272
  • 9
  • 14