6

I've always been somewhat paranoid about verifying data backed up to removable media, so after copying stuff to a USB flash drive or portable HDD, I invariably unmount the drive, remount it and diff -q the stored files with the originals.

Years ago I discovered that (at least with the equipment I've got), I was seeing bit errors at something on the order of 1bit/GByte. Somehow (I forget the details) I discovered that the cure is, before writing any data, to do

echo 64 > /sys/block/sda/device/max_sectors

(assuming the media appears as sda of course). As long as I remember to do that, I've never had any problems. (I belive the default max_sectors value is 128).

My questions are:

  • Is this just me ? I've seen the issue with a variety of flash drives, portable HDDs, motherboards and laptops (but never done an exhaustive test of all combinations to see if I have any which are actually reliable). The media which has been used with windows, and the machines which dual-boot windows, seem to have no similar problems there so it does appear to be linux specific.

  • What actually causes the issue ? Is non-standards-compliant media, chipsets, cables ?

  • Is there anything I can configure on my systems (Debian Lenny) which will automatically set the max_sectors ? (Some HAL scriptage or sysctl tweak ? A more global /sys parameter ?). Presumably the default 128 is in the kernel somewhere, but a custom kernel seems a bit drastic.

Thanks for any advice

timday
  • 856
  • 1
  • 10
  • 24

4 Answers4

4

First of all when you get a new device I can recommend writing data to it and verify the data afterwards using md5sum/sha1sum/... Especially cheap USB devices tend to be broken. :( It's not that unusual that USB pens work fine on the first few (hundred) MBs but tend to dataloss on the last MBs. Sadly many users aren't aware of that and notice problems too late: when the USB pen is getting full.

The problem you're speaking of is usually located at the USB chipset (though sometimes it's visible only when using special combinations of hardware). If it works with Windows but fails with Linux using the same hardware this sounds like there's a workaround in the Windows driver that does not exist in the Linux kernel (yet). And whereas Linux uses max_sectors=240 by default (being 120kB) it seems to be 64kB transfers (max_sectors=128) for Windows, according to http://www.linux-usb.org/FAQ.html#i5 (where some problems with the adapters made by Genesys Logic are mentioned as well).

To automatically set the max_sectors: use udev for this task. It allows you to configure max_sectors for just the devices you want to adjust it for.

You can retrieve the necesssary information about your usb device running:

# udevadm info -a -p /sys/class/block/sda/

or for older udev versions:

# udevinfo -a -p /sys/class/block/sda/

Then grab the attributes you'd like to use for your rules and create a new file like 42-fix-max_sectors.rules and place it under /lib/udev/rules.d (when using recent udev versions) or /etc/udev/rules.d (for older udev versions). To get an idea how this configuration file could look like:

SUBSYSTEM=="block", ATTRS{model}=="Voyager Mini    ", RUN+="/bin/sh -c '/bin/echo 64 > /sys/block/%k/device/max_sectors'"

Make sure to reload udev after writing your rules file (through running /etc/init.d/udev reload). For testing your configuration file I can recommend using:

# udevadm test /sys/class/block/sda/

PS: I prefer to replace hardware if I notice any problems because usually it's not worth the effort to work on any workarounds (and I'm sure you're aware of murphy who will catch you anyway as soon as you think you've got it working ;)).

Michael Prokop
  • 444
  • 2
  • 5
2

Depending on the number of files or size of files, I have seen similar things in the past.

During a backup/copy of over 4 million .PDF files we found that the MD5 hashes on some files were NOT the same!

What worked for us was rsync.

One thing I would suggest you try is to rsync the files over and see if you still experience the data loss.

Hope this helps.

KPWINC
  • 11,274
  • 3
  • 36
  • 44
  • 1
    Interesting. Any idea why it works better though ? Does rsync include extra integrity checking ? Or does it just use smaller transfers and avoid the problem ? – timday Jun 10 '09 at 20:41
  • 2
    rsync does do integrity checking. But that being said, what shocked us was why something as common as the cp command (we also tried scp) would actually copy a file (with data loss) and not throw some sort of error. You can imagine our surprise after copying millions of files only to have our "spot checking" of MD5 hashes fail. We then tried rsync and it worked flawlessly. Perhaps someone on here could shed more light as to why that is. – KPWINC Jun 12 '09 at 18:03
1

On my Ubuntu 9.04 system, both /etc/udev/rules.d and /lib/udev/rules.d are used. The README recommends using /etc/udev/rules.d for local rules or to override package-supplied rules which are contained in /lib/udev/rules.d. Also, it says:

The udev daemon watches this directory with inotify so that changes to these files are automatically picked up, for this reason they must be files and not symlinks to another location as in the case in Debian.

I'm posting this information here since Ubuntu is Debian based, and these differences might be important to someone who might assume that the behavior is the same.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
1

This is excellent! I expect random data corruption on all hardware, but I have noticed that USB devices are so poor that it's an ongoing nightmare. Obviously the root cause was crappy USB chipsets combined with protocol and filesystem developers that don't care about data integrity, but a workaround that makes USB devices usable is very appreciated.

You should still expect data corruption always, because you don't know what new problems you could encounter that cause it. Keep hashes of all your files before copying a large data set, and verify them after.

cd /oldfs
find . -type f -exec md5sum {} \; >& /oldfs.md5
rsync -a . /newfs/
cd /newfs
md5sum -c /oldfs.md5 | grep -v OK$

Now that it has been found that the default setting for max_sectors causes corruption on many devices, it would be good to lean on distribution vendors and kernel developers to distribute defaults that focus on data integrity with common devices, and not on performance with a few. The devices being non-compliant is hardly a reason to not use the settings they expect, when data corruption is the result.

carlito
  • 2,489
  • 18
  • 12