8

We are working with a hosting company for managing our dedicated servers. We asked for a replacement disk on one of the servers by providing the serial number. They took out 2 other disks instead and put them back in. When we asked for an explanation, they told us that it is very hard to read the serial number and they mistakenly took other disks out.

They also provided us a photo to show us that it is indeed very hard to read the serial number. It is here.

enter image description here

Is the only way to read the serial number peeking through the holes? This is the chassis.

EDIT: I have provided them the port number of the disk along with the serial number. They told me that they "think the controller cabling is mixed up or upside-down"

UPDATE: This is the final perspective of the DC.

Normally when our customer provides us with a controller port number and a serial we have all the required information to replace the disk. However in this case it appears that the wiring between the controller and the backplane in your server is incorrect. This is an unforseen event that should not happen.

refik
  • 193
  • 6
  • What operating system is this? What RAID controller is installed? You can map the drives by illuminating their LED beacon lights from the OS. – ewwhite Nov 03 '14 at 14:18

3 Answers3

13

There exist no way at all to (optically) read a disk serial number once the disk is inserted into the system - the holes in the front of the disk caddy are not meant to facilitate reading a serial number but are there out of thermal and/or design considerations. This is generally true for every server/disk array manufacturer.

Usually, you replace disks by slot numbers (or the "error LED", which also can be activated by command in many RAID controllers), but for that you have to be sure of the mapping between controller port/disk numbers and physical slots.

Nevertheless, what the DC did was beyond irresponsible, they had to get back to you before just pulling random disks to work out the correct slot. Doing this twice is beyond stupid, as you can kill even a RAID6 this way.

Sven
  • 97,248
  • 13
  • 177
  • 225
9

Supermicro is rough... But so is your hosting company!!

There's no excuse for that. Ask for a credit on your monthly bill for the mistake if it caused downtime.

So think of this:

  • The disk serial number should be irrelevant. Replacements should be based on model number. (One of the downsides of DIY Supermicro hardware is that you aren't dealing with a single manufacturer/warranty and unified part numbers)

  • The model/types of the drive should be known to the owner of the hardware or hosting company. Their inventory system should have this information readily available.

  • Pulling the wrong disks is just dangerous. I'm hoping this didn't happen while the server was running.

  • It is possible to illuminate a disk locator LED under mode RAID situations. For instance, I will light a disk beacon for an hour to make sure a datacenter technician pulls the proper drive.

  • Any form of hardware RAID and most software RAID solutions under every OS I'm familiar with is capable of providing disk information. Hardware controllers will give you the raw disk manufacturer, model and firmware:

  physicaldrive 3C:1:6
     Port: 3C
     Box: 1
     Bay: 6
     Status: OK
     Drive Type: Data Drive
     Interface Type: SATA
     Size: 1 TB
     Firmware Revision: 80.00A80
     Serial Number:      WD-WCAV5F799696
     Model: ATA     WDC WD10EARS-22Y
     SATA NCQ Capable: True
     SATA NCQ Enabled: True
     Current Temperature (C): 39
     Maximum Temperature (C): 53
     PHY Count: 1
     PHY Transfer Rate: 3.0Gbps
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    Not sure SuperMicro is so terrible. Using them on most rigs and being happy. The rest of the answer is good ;) On top, it is not really the DC fault if the OP does not say "slot 1" or "slot market red LED" but gives the serial number. They did the best they could given the information explicitly provided by the OP. – TomTom Nov 03 '14 at 13:32
  • @TomTom You're right. I didn't realize the OP only specified the serial... Yeah, that's the wrong way to do this. – ewwhite Nov 03 '14 at 13:39
  • 1
    The only fault I see with the DC in this instance is that they didn't do a `"What? Are you sure that's how you want us to identify the drive?"` when asked to pull a drive by reading the serial number. They should have asked the OP for the slot number and that's what the OP should have given in the first place. At the end of the day, the DC did what they were asked. – joeqwerty Nov 03 '14 at 13:45
  • I have provided them the port number with the serial number when I first asked for a replacement. They told me they "think the controller cabling is mixed up or upside-down" – refik Nov 03 '14 at 13:49
  • 3
    At my old cloud firm and my current datacenter, they go through great pains to avoid this. Lots of "are you *sure*" and "here's a cell-phone photo", etc. – ewwhite Nov 03 '14 at 13:50
  • @refik So does this go back to being a Supermicro issue? Mis-cabled backplane? No LED illumination? – ewwhite Nov 03 '14 at 13:51
  • 1
    @ewwhite I guess all of them contribute. – refik Nov 03 '14 at 15:52
  • @TomTom "They did the best they could given the information explicitly provided by the OP." by far not. Better ask twice instead of doing something wrong once... – glglgl Nov 04 '14 at 06:32
  • @glglgl Not really. Highly unprofessional. If you get a ticket "replace the disc with the following serial number" in a data center you assume the customer is not a full idiot and knows what he does - as well as the conseuquences. If the server runs I would assume they ask (turn server off please) but otherwise - no, totally good. – TomTom Nov 04 '14 at 06:43
  • 2
    But obviously they *did* the wrong thing: "they told us that it is very hard to read the serial number and they mistakenly took other disks out". If they can't do the right thing, they have to ask instead of guessing. Guessing is much more unprofessional than asking, IMO. – glglgl Nov 04 '14 at 06:46
  • @TomTom if you read the question you will see that I did provide them the controller port number as well. LED identification does not work in our controller. There was nothing else I could have provided them. The server was running and they didn't ask to turn it off. As a result, an irrelevant array got degraded. – refik Nov 04 '14 at 09:37
  • @refik Was this THEIR server? Is it owned by the datacenter? If so, you still need to demand a credit. The backplane wiring issue should never happen. – ewwhite Nov 04 '14 at 12:12
  • @ewwhite yes the server is owned by the datacenter. Thank you for the info and the answer. – refik Nov 04 '14 at 13:29
1

The only sensible way to ask for a replacement would be by knowing the slot number of the disk. It should be possible to map from the OS to the slot number but it requires calibration in that you need first to learn what identifies a slot and where it is physically.

The disk serial number of part number is not readable from the outside.

Baruch Even
  • 1,043
  • 6
  • 18