Diagnosing Windows / HDD / RAID0 failure

0

My machine is as follows:

  • Pair of hard disks in RAID0, ATA Hitachi HDT72505
  • nVidia motherboard, "M51"?

Windows XP refused to boot:

  • Booting normally results in hanging during the Windows loading screen
  • Booting in safe mode, it is able to reach the login screen. Logging in results in hanging.

Using the Windows XP installation CD:

  • Does not detect any of the hard disks
  • If an external HD is plugged in, then the CD will mount that HD while using the Recovery Console

Luckily, I have a DVD of Kubuntu 9.10.

fdisk -l shows this output:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xc0cfc0cf

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       60801   488384001    7  HPFS/NTFS

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xfff7fff7

   Device Boot      Start         End      Blocks   Id  System
Note: sector size is 4096 (not 512)

Note that there is no device shown under /dev/sdb. Kubuntu did not automatically mount the hard disk.

Trying to do: mount -t ntfs-3g /dev/sda1 /mnt/windows results in mount complaining that ntfs-3g: Failed to access volume '/dev/sda1': No such file or directory

The Kubuntu installer cannot determine how much of the hard disk is used by the NTFS partition, either.

I am currently running: smartctl --test=long /dev/sda but am not sure what the output will mean

Update:

This is the output of dmraid -ay -vvvv -dddd

WARN: locking /var/lock/dmraid/.lock 
ERROR: unsupported sector size 4096 on /dev/sdc.
NOTICE: skipping removable device /dev/sdd      
NOTICE: skipping removable device /dev/sde      
NOTICE: skipping removable device /dev/sdf      
NOTICE: skipping removable device /dev/sdg      
NOTICE: /dev/sdh: asr     discovering           
NOTICE: /dev/sdh: ddf1    discovering           
NOTICE: /dev/sdh: hpt37x  discovering           
NOTICE: /dev/sdh: hpt45x  discovering           
NOTICE: /dev/sdh: isw     discovering           
DEBUG: not isw at -522494976                    
DEBUG: isw trying hard coded -2115 offset.
DEBUG: not isw at -523576832
NOTICE: /dev/sdh: jmicron discovering
NOTICE: /dev/sdh: lsi     discovering
NOTICE: /dev/sdh: nvidia  discovering
NOTICE: /dev/sdh: pdc     discovering
NOTICE: /dev/sdh: sil     discovering
NOTICE: /dev/sdh: via     discovering
NOTICE: /dev/sdb: asr     discovering
NOTICE: /dev/sdb: ddf1    discovering
NOTICE: /dev/sdb: hpt37x  discovering
NOTICE: /dev/sdb: hpt45x  discovering
NOTICE: /dev/sdb: isw     discovering
DEBUG: not isw at 1891654656
DEBUG: isw trying hard coded -2115 offset.
DEBUG: not isw at 1890572800
NOTICE: /dev/sdb: jmicron discovering
NOTICE: /dev/sdb: lsi     discovering
NOTICE: /dev/sdb: nvidia  discovering
NOTICE: /dev/sdb: nvidia metadata discovered
NOTICE: /dev/sdb: pdc     discovering
NOTICE: /dev/sdb: sil     discovering
NOTICE: /dev/sdb: via     discovering
NOTICE: /dev/sda: asr     discovering
NOTICE: /dev/sda: ddf1    discovering
NOTICE: /dev/sda: hpt37x  discovering
NOTICE: /dev/sda: hpt45x  discovering
NOTICE: /dev/sda: isw     discovering
DEBUG: not isw at 1891654656
DEBUG: isw trying hard coded -2115 offset.
DEBUG: not isw at 1890572800
NOTICE: /dev/sda: jmicron discovering
NOTICE: /dev/sda: lsi     discovering
NOTICE: /dev/sda: nvidia  discovering
NOTICE: /dev/sda: nvidia metadata discovered
NOTICE: /dev/sda: pdc     discovering
NOTICE: /dev/sda: sil     discovering
NOTICE: /dev/sda: via     discovering
DEBUG: _find_set: searching nvidia_ijdbffag
DEBUG: _find_set: not found nvidia_ijdbffag
DEBUG: _find_set: searching nvidia_ijdbffag
DEBUG: _find_set: not found nvidia_ijdbffag
NOTICE: added /dev/sdb to RAID set "nvidia_ijdbffag"
DEBUG: _find_set: searching nvidia_dacifgcg
DEBUG: _find_set: searching nvidia_dacifgcg
DEBUG: _find_set: not found nvidia_dacifgcg
DEBUG: _find_set: not found nvidia_dacifgcg
DEBUG: _find_set: searching nvidia_dacifgcg
DEBUG: _find_set: not found nvidia_dacifgcg
NOTICE: added /dev/sda to RAID set "nvidia_dacifgcg"
DEBUG: checking nvidia device "/dev/sdb"
DEBUG: set status of set "nvidia_ijdbffag" to 16
DEBUG: checking nvidia device "/dev/sda"
DEBUG: set status of set "nvidia_dacifgcg" to 16
RAID set "nvidia_ijdbffag" already active
INFO: Activating linear raid set "nvidia_ijdbffag"
RAID set "nvidia_dacifgcg" already active
INFO: Activating linear raid set "nvidia_dacifgcg"
NOTICE: discovering partitions on "nvidia_ijdbffag"
NOTICE: /dev/mapper/nvidia_ijdbffag: dos     discovering
NOTICE: /dev/mapper/nvidia_ijdbffag: dos metadata discovered
NOTICE: created partitioned RAID set(s) for /dev/mapper/nvidia_ijdbffag
NOTICE: discovering partitions on "nvidia_dacifgcg"
NOTICE: /dev/mapper/nvidia_dacifgcg: dos     discovering
NOTICE: /dev/mapper/nvidia_dacifgcg: dos metadata discovered
DEBUG: _find_set: searching nvidia_dacifgcg1
DEBUG: _find_set: not found nvidia_dacifgcg1
NOTICE: created partitioned RAID set(s) for /dev/mapper/nvidia_dacifgcg
RAID set "nvidia_dacifgcg1" already active
INFO: Activating partition raid set "nvidia_dacifgcg1"
WARN: unlocking /var/lock/dmraid/.lock
DEBUG: freeing devices of RAID set "nvidia_ijdbffag"
DEBUG: freeing device "nvidia_ijdbffag", path "/dev/sdb"
DEBUG: freeing devices of RAID set "nvidia_dacifgcg"
DEBUG: freeing device "nvidia_dacifgcg", path "/dev/sda"
DEBUG: freeing devices of RAID set "nvidia_dacifgcg1"
DEBUG: freeing device "nvidia_dacifgcg1", path "/dev/mapper/nvidia_dacifgcg"

This is the output of dmraid -r

/dev/sdb: nvidia, "nvidia_ijdbffag", linear, ok, 976773166 sectors, data@ 0
/dev/sda: nvidia, "nvidia_dacifgcg", linear, ok, 976773166 sectors, data@ 0

This is the output of smartctl -a /dev/sda

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/                        

Smartctl open device: /dev/sda1 failed: No such file or directory
root@ubuntu:~# smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/                        

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar T7K500
Device Model:     Hitachi HDT725050VLA360
Serial Number:    VFK401R424LAJK         
Firmware Version: V56OA7EA               
User Capacity:    500,107,862,016 bytes  
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7                                              
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1               
Local Time is:    Tue Nov  2 02:39:57 2010 UTC                   
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 117) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                 (8389) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 140) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   097   097   016    Pre-fail  Always       -       196612
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   137   137   024    Pre-fail  Always       -       287 (Average 441)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       1086
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1885
  7 Seek_Error_Rate         0x000b   099   099   067    Pre-fail  Always       -       1
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       12937
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1075
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1797
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1797
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Lifetime Min/Max 18/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2120
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       50%     12937         621063894

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This is the output of smartctl -a /dev/sdb

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/                        

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDP725050GLA360
Serial Number:    GEB531RE00M21B         
Firmware Version: GM4OA50E               
User Capacity:    500,107,862,016 bytes  
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8                                                     
ATA Standard is:  ATA-8-ACS revision 4                                  
Local Time is:    Tue Nov  2 02:41:11 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (7854) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 131) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   131   131   054    Pre-fail  Offline      -       147
  3 Spin_Up_Time            0x0007   161   161   024    Pre-fail  Always       -       222 (Average 256)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       1011
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   131   131   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       12930
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1008
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1747
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1747
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Lifetime Min/Max 17/39)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ssfsx17

Posted 2010-11-02T01:19:52.537

Reputation: 1

RAID0 Failure? Never! Honestly, what are you trying to diagnose? That's the title of your question, but you never specify. If I were you, I'd just reinstall (this time on RAID1), restore from backup, and be done with it. – EEAA – 2010-11-02T01:33:27.300

I'm trying to figure out exactly what's wrong, so I know how to proceed from here. Also, I'm not sure how to backup the data from the drive since I can't even mount it. – None – 2010-11-02T01:38:08.293

If you're unable to mount it, it's fairly likely that it's too late. You may not know this, but RAID0 actually doubles your change of losing data due to hardware failure. Data is striped across both disks, and all it takes is a single drive failure to take the whole thing down irrecoverably. Next time, try RAID1, 5, 6, or 10. – EEAA – 2010-11-02T02:05:21.157

It looks like one of the HDDs may have failed - how do I mount it and grab data off of it? – None – 2010-11-02T02:46:53.913

That's the thing. In a RAID0, if you have one drive fail, all the data is lost. When you write files to a RAID0, files are striped across both disks. This means that when you lose one drive, you're (roughly) losing half of each file, which even if you could recover, you'd not be able to do anything with. This is the reason that I never recommend RAID0 for any application. – EEAA – 2010-11-02T02:53:22.660

I suspect that unless the whole thing can't spin anymore (which it still can) a few bad sectors, probably where the OS used to be, cannot cause that much data to be permanently lost! – None – 2010-11-02T03:10:36.110

1That drive (/dev/sda) is clearly bad, as smartctl said. Not only are files split between drives, but also the filesystem's structure and metadata are striped across drives. This is why you're unable to mount it. If you really need data off of it, you better contact data recovery professionals and be prepared to fork out a lot of money. – EEAA – 2010-11-02T03:21:50.310

I disabled RAID in the bios, thus allowing access to the individual HDDs. Then I was able to do some quick scans with 3rd-party tools. Although I still ended up having to contact professionals, I saved at least $100 by getting this far on my own. – ssfsx17 – 2010-11-03T00:13:12.237

Answers

0

Being able to almost login to windowsXP does not sound like a failed drive in a raid0. When I have seen failed raid0 arrays there is no booting (hardware).

For the liveCD of Kubuntu, the proper driver (module) is not being loaded for the raid0 and Kubuntu is trying to read sda and sdb as separate drives. Sda is "working" since it has a MBR and sdb is failing since it does not. Your not able to mount /dev/sda1 since half of the data for your NTFS volume is on sdb which, is not being read.

How is the Raid0 being done? (software or hardware)

wrmine

Posted 2010-11-02T01:19:52.537

Reputation:

As far as I know, it's being done with hardware – None – 2010-11-02T02:14:46.953

0

Run a chkdsk on the array, slipstream the sata driver into a XP install CD, then boot into recovery and run chkdsk /r from the command prompt.

I use this to slipstream the "Mass storage controllers" into a XP cd, no need for the other driver packs they offer in your case.

http://driverpacks.net/about

How to use DPsBASE tool http://users.telenet.be/jtdoom/basetute/Eng_tut6b.htm

.

Moab

Posted 2010-11-02T01:19:52.537

Reputation: 54 203

0

Clearly one of the drives is failing. You can't mount just one of the drives since it only contains part of the data; you have to mount the whole array, which is /dev/mapper/nvidia_ijdbffag1.

psusi

Posted 2010-11-02T01:19:52.537

Reputation: 7 195