I purchased a used Poweredge T610 and upgraded it to 2x Hexcore Xeon X5675 processors and 96 GB RAM. Initially, I used 3 WD green 2TB drives in a RAID-5 array (Perc6i controller) and installed Ubuntu server on the virtual disk. This setup served me well for about a year and then the problems started:
I bought some new drives to expand as a second array - 4x 3TB WD red drives. In the meantime I had learned that at least WD green is not a good choice, so I wanted to back up some data on the new VD. Turns out that the Perc6i does not like drives >2TB, but it recognised the first 2 of 3 TB. I had not started setting up a VD with the new drives yet, but 3 weeks later, my WD green array started corrupting (first only strange glyphs in some software, then more severe issues up till corrupted boot sequence). I ended up with a professional data recovery service who luckily could help me. I exchanged the Perc6i for a H700 and set up a RAID6-array of 4 3TB WD red drives (which I tested with the dell hardware diagnostics extended test before setting up - no errors on any of them). Install Ubuntu, all software I need, x2go etc... Up and running again.
Now I get the same problem as before - in X2go it starts with the same software (bioinformatics artemis package) spitting out glyphs in the command line and it seems I am going back to square one. All status LEDs on the caddies are constant green, i.e. online. No predicted failure that the system recognizes at least.
I am starting to wonder what the problem could be:
What I don't think is likely: -primary disk failure (again!), since the drives were new, had no bad sectors upon extended testing and haven't had much power-on-time at all. -the perc6i controller has been exchanged for a H700 after the first disaster and should not be the problem
What I need help to evaluate: -backplane /cable issues? (The H700 controller came with cables for another server type that did not fit my case - simply used another SATA6-cable to connect the controller to the backplane) The drives are by the way sitting in the same bays as the previous, failing ones, with an original dell SATA-cable going there.
-Motherboard issues? -CPU or RAM issues? -Power supply (voltage peaks??)
Has anyone had a similar problem before? Any help here is much appreciated. Unfortunately I am away for another two weeks before I can get access to the server (both physically and network), the issue has been "reported" by my wife, who works with the server in our local network (but unfortunately won't be able to help troubleshooting).
Yes I did run the complete Dell hardware diagnostics procedure, without any issues. Only one of the drives was detected with defective blocks, but I was unable to rebuild the raid 5 array, hence the data recovery specialist. All other hardware was ok
I just wonder if there could be inconsistent problems like glitchy contacts anywhere that can go through the tests at one point and fail an any other time. Or if the tests don't cover all scenarios...