Seagate Momentus XT corrupting files (Linux and Mac)

10

6

My Seagate Momentus XT hybrid hard drive is corrupting files on Linux. I would appreciate help from anyone, but I'd particularly like to know if other Momentus XT users are able to reproduce this problem; I have provided step-by-step instructions for reproducing this issue on the Seagate Community Forums.

So far, four users have reproduced this problem on the following laptops and OS/distributions:

  • Five laptops: Lenovo Thinkpad T60, T61, T510, MSI MS-1656-ID1, and MacBook Pro (15" late 2009).
  • Four OS/distributions: Ubuntu 11.04, Fedora 15, openSUSE, and Mac OS X.

The instructions for reproducing the problem are simple. Here is a brief verbal description:

  1. Create a large test file, save it to another storage device (not the Momentus XT), and compute the SHA-1 checksum.
  2. Write the test file to the Momentus XT.
  3. Read the test file from the Momentus XT, calculate the SHA-1, and compare this checksum with the checksum of the original. We should have a match. We have probably reproduced the problem if they don't match. (Only 'probably', because it is possible for other issues to cause a mismatch. See the Seagate thread about identifying this specific problem by comparing the files with cmp -l.)
  4. Repeat from step (2).

The Seagate thread has more details. Here are some notes from my testing (I have been able to reproduce this problem on three consecutive Momentus XT drives; I RMA'd twice and am now on the third one):

  • What seems to be happening is that the Momentus XT sometimes neglects to write data to the drive, so that when I read from the drive, I get what was originally on the sector, and not the correct data. This occurs in blocks of different sizes; typical sizes are 1 MiB and 512 KiB.
  • Problem occurs on ext2, ext4, Btrfs, NTFS, and FAT32. Strangely, I was not able to reproduce this problem on ext3.
  • Writing with the oflag=direct output flag in dd avoids this problem. Rapidly commiting data to disk with while true; do sync; sleep 0.01; done also prevents the problem.
  • I have only been able to reproduce this problem through a SATA and an eSATA interface. A USB connection seems to prevent the problem. (Not sure if this is due to transfer speed.)
  • Problems occur more often with large files (>2 GB). I was not able to produce problems with files smaller than about 85 MB.
  • I was not able to reproduce the problem on Windows XP with NTFS.
  • Gazoi at the Seagate forums was unable to reproduce the problem on FreeBSD 8.2 with UFS2.
  • The Momentus XT passes both the extended SMART test and badblocks -w with no issues.
  • My laptop (MS-1656-ID1) has successfully passed through 24 hours each of Memtest86+, Memtest86, memtester, and MPrime.
  • I have tested two other storage devices (a Seagate Momentus 7200.4 and an Intel 320 series SSD) with the same procedure, and they both pass with no issues.

If you have a Momentus XT, please try reproducing this problem and let me know what happens.

What else can I do to diagnose the problem?

Vincent Yu

Posted 2011-07-19T17:05:52.953

Reputation: 473

1Are you rebooting between the write and the read? Are you flushing the cache to make sure that it is actually read back from the disk instead of cache? If not, that may be why you can't reproduce it with smaller files since they are more likely to still be in the cache – psusi – 2011-07-19T19:29:48.717

I am bypassing the page cache by reading with the iflag=direct input flag with dd. When I am not using dd, I flush the cache with sudo sh -c "sync && echo 1 > /proc/sys/vm/drop_caches" – None – 2011-07-19T20:22:38.287

I just realized that you might be talking about the disk buffer that is physically on the Momentus XT, instead of the Linux page cache. You have a point in this case - I am mostly not taking any precautions against reading the disk buffer, and I am also immediately reading the file after writing it. Perhaps I should commit writes to multiple files, and then read them on a FIFO basis. With that said, the Momentus XT disk cache is only 32 MB, and I have also done some tests in the past where I have wrote >10 GB of small files (~64 MB), and read them afterwards without any finding corruption. – None – 2011-07-19T20:40:33.413

Maybe there's a bug in the drive's firmware (integer overflow?) that causes it to loose track of unwritten blocks under high speed write. Or there are some bad blocks on the flash that the firmware failed to detect. NTFS verifies writes (and ntfs-3g is very slow), so the problem is unlikely to appear. If you can't find a fix, you may want to use ZFS and enable checksum. – billc.cn – 2011-07-21T21:08:56.623

@billc.cn, I have reproduced the problem with NTFS. Btrfs and ZFS with data checksums will conspicuously fail to read corrupted files, but the corrupted data will be unrecoverable. More redundancy (e.g., FS-level mirroring or RAID mirroring) will be needed to recover these files. In any case, I am of course not using the Momentus XT until I figure out what's happening. – Vincent Yu – 2011-07-21T21:24:08.737

I've got a Momentus XT, but its running Windows 7. I'll test it on Win7. I could boot into Linux using a USB device, but I don't know a whole lot about Linux commandline, so that'd probably end in disaster, heh – William Lawn Stewart – 2011-07-22T00:27:30.133

@William: Great! I would still recommend booting Linux from a live USB, simply because all my instructions on the Seagate thread are for Linux. There are only three user-specified parameters required; everything else is just copy and paste. However, a Windows 7 test could be interesting.

– Vincent Yu – 2011-07-22T00:52:46.173

My Momentus XT (SD22, NTFS) failed the test with a 4GB file running from an Ubuntu 11.04 LiveCD on an HP dv7-6014tx. I'll try it on Windows 7 once I've figured out how to make sure its not caching anything. – William Lawn Stewart – 2011-07-22T02:37:20.397

I just compared the two files on Windows 7, and they are exactly the same. These are the same two files that Ubuntu came up saying they were different. – William Lawn Stewart – 2011-07-22T02:54:23.963

@William: Thank you very much for the information. Strange. How did you compare the files on Windows? – Vincent Yu – 2011-07-22T03:09:19.383

@Ironclaw I wrote a C# program that compared each file byte by byte. Also double-checked by running WinDiff. According to both the files are identical. Your data isn't at risk, it seems. Maybe Linux makes some assumption about buffering which stands for other drives, but not this one? – William Lawn Stewart – 2011-07-22T03:18:23.287

@William: Argh. I've just discovered there seems an occasional problem with my instructions when used with NTFS partitions - dd refuses to read files, and I made the stupid decision of redirecting standard error to /dev/null, meaning that messages stating this are suppressed. Sorry about this. It's because of the iflag=direct input flag I'm using with dd - I need to figure out how to fix this. I'll comment here again when I've got it fixed. – Vincent Yu – 2011-07-22T03:31:23.880

@William: I have updated the instructions on the Seagate post with different commands for NTFS. I have tested the new commands with an NTFS partition and I was able to reproduce file corruption. Would you mind going through the test on Linux again? The previous problem was due to direct I/O not being supported on the open-source NTFS-3G driver. – Vincent Yu – 2011-07-22T04:18:54.687

@Ironclaw: Sure =) The Seagate forums won't let me register, so I'll have to post the results here again. – William Lawn Stewart – 2011-07-22T04:44:59.390

Turns out that flushing buffers while using a Live CD install is a great way to make it lock up, I'll go find a spare drive to install to instead, heh – William Lawn Stewart – 2011-07-22T06:17:48.233

@William: Whoops... Looks like I just keep making things worse. I'll add this to the Seagate post. Thanks for taking the time to do this. – Vincent Yu – 2011-07-22T06:23:48.183

@William: An installation on spare drive would work. Alternatively, if you have space, you can resize your NTFS partition and create an ext4 partition on the Momentus XT. Then you can write the test files to that instead, without having to flush the page cache. – Vincent Yu – 2011-07-22T06:33:14.513

Installed Ubuntu on my USB drive, but it had some error message about a radeon-switcheroo and refused to boot, so I tried the Live CD again. Didn't lock up this time (possibly because I didn't try listening to music at the same time? =P). The results are weird - It copied the 4GB file five times, and they were all fine. No errors at all. I guess I could make an ext4 partition for testing and see if that changes anything? Or perhaps 5 times wasn't enough. – William Lawn Stewart – 2011-07-22T08:24:12.723

@William: Well, it's good that you're not having any problems. I would suggest looping through the test more than 5 times - the second Momentus XT that I had would normally fail only after about 15 copies. Looping the test 20-30 times is probably sufficient. Are you saving the test file on an external hard drive? – Vincent Yu – 2011-07-22T08:36:15.780

I've been using my laptop's secondary internal hard drive for the input directory. – William Lawn Stewart – 2011-07-22T08:41:34.143

I'll be able to run an extended test tomorrow, today my laptop is doing some number crunching. It might be useful if someone with a Mac could run the tests, perhaps this explains why Mac folks have freezing issues with the XT. – William Lawn Stewart – 2011-07-23T22:58:07.447

@William: Okay, thanks! I agree about testing on a Mac, especially since Mac OS X is Unix-based. If you are able to reproduce the problem, I think I'll submit something to Slashdot and Digg, so that this can be tested on more hardware and OSes. Otherwise, if you can't reproduce the problem, I'll wait for more information from other users - I don't want to pollute news sites with a problem that affects only a small fraction of users. – Vincent Yu – 2011-07-23T23:19:03.427

Answers

6

Updating the firmware to SD26 fixes this problem with file corruption on Linux. Unfortunately, SD26 has not been publicly released.

The best way to obtain the SD26 firmware is to ask Seagate for a copy of the bootable update CD or Windows update utility.

I emailed Seagate Technical Support on August 20 to ask about SD26, after I discovered that it solved my problem with file corruption. Here are the two replies that I received from Seagate Global Customer Support on August 23:

Thank you for contacting Seagate today. Give me some time to get with the developers so I can see what is going on with the SD26 firmware. Because of your discoveries we need to do a little investigating. We do very little testing with Linux. Let me see what I can find out and if further testing and a public release of the firmware is needed. I am attaching the ISO file. I will be getting back with you as soon as I hear back. Seagate is very customer oriented and we appreciate you bringing this to our attention.


We usually do not make firmware available publicly. It can do more damage then good in the wrong hands. We receive a lot of bricked drives from improper updates.We like for consumers to contact us, so that we can verify that a firmware upgrade is needed and beneficial. Alan M. is our moderator for the forums and he will be making an announcement on your thread. Again, thank you for bringing this to our attention. Our customers are the best, and a great source of information and usually the first to let us know when things are not working as they should. Allow us the chance to fix the problem.

As I have stated already, I think it is best to get SD26 directly from Seagate. However, there are also leaked copies of both the bootable ISO and the Windows utility that are easily found by searching on Google. The SHA-1 checksum of the SD26 bootable update CD (*.iso) that I received from Seagate is b7b0c7e1b9529925b0364b2cf19a62d608b58082.

I have posted the information in this answer, and other miscellaneous details, in the Seagate thread.

Vincent Yu

Posted 2011-07-19T17:05:52.953

Reputation: 473

3

Firmware SD28 is now available: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=215451

– madh – 2011-09-26T08:36:28.627

4

All mention of this problem have been removed from the Seagate forums. They probably figure that the discussion is unnecessary now that one can fix the problem by updating to SD28.

The only problem with that is that people no longer have a means to get authoritative info describing the problem, and/or how to fix it, since all references to any correlation between the problem and it's solution have been removed from the Seagate site. The firmware update download is still available, but there is no mention of what the firmware does, so affected users may have difficulty trying to find a solution.

So to help googlers: The SD28 firmware download at http://knowledge.seagate.com/articles/en_US/FAQ/215451en?language=en_US IS a solution to the data corruption bug that occurs with these Seagate Momentus XT drive models:

  • ST92505610AS

  • ST93205620AS

  • ST95005620AS

Michael

Posted 2011-07-19T17:05:52.953

Reputation: 41

2

I have just verified that is happening in OS X as well. :o(

I had suspected data corruption for two reasons:

1) the momentus XT is designed to cache frequently used files in the 4GB of flash RAM, and these files most often are small files needed to load programs during boot up (setting files, etc). With increasing frequency, my programs that are configured to load at login would suddenly come up with default settings, or give me the 'welcome tutorial' and have no settings programmed. These included mail.app (no account info stored), little snitch (port monitor with no rules), quicksilver (welcome), and others. This lead me to believe that small files in the cache are corrupted.

2) The drive suffered from causing 'spinning pinwheels' where the drive seemed to have spun down, and when the OS needed to access it I had to wait for the drive to spin up again. The waiting became more frequent and lasted longer. On several occasions I would simply wipe the drive and use carbon copy cloner to copy over everything which as previously backed up. However, after copying the files to the momentus xt many of the larger files (movies, ISO images, zip files) were corrupt, and either would not load or open. I thought the problem with the beach balls and spin-downs would be resolved when I upgraded from Snow Leopard to Lion, as a few users have posted recently about, but the 3.46Gb Dev preview ISO file I copied to momentus XT off a USB drive was corrupted so I couldn't even install Lion.

I had just stumbled across your thread at the seagate forums and come here to post:

I used a program called 'smart utility' under os x, which said that the drive was failing. I think the error was for code 184, which were "end-to-end" errors. I was alarmed that the drive was "failing" but I read that others were receiving this error when they upgraded their firmware (as I did from SD23 to SD25), and that Seagate would only deem a drive as failing if it failed under their SeaTools.

I put my momentus xt in a usb enclosure and ran SeaTools on the drive through windows on another computer. SMART Check was not an allowed test. Running 'short drive test' and 'long generic test' no errors were reported? Now I've tried to duplicate your methods under OS X, and I too found the files are getting corrupted. I'm using a late 2009 Macbook Pro 15", and I have a 500GB Momentus XT on SD25.

Fred

Fred

Posted 2011-07-19T17:05:52.953

Reputation: 21

Thanks for reporting this! Can you run the following command to compare an uncorrupted file with its corrupted copy? cmp -l <uncorrupted file> <corrupted copy> | head If we are having the same problem, you should see three columns of numbers, and the leftmost column should have ten consecutive numbers. (Details: Leftmost column shows the byte numbers of the differing bytes, and the other two columns show the actual values of the differing bytes.) – Vincent Yu – 2011-07-26T20:48:55.633