I'm a relatively new (and the only) system admin at my organization. We have an HP ProLiant DL380p Gen8 Server that is no longer under any sort of support contract from HP. We're using it as a Hyper-V host to 4 virtual servers. The virtual host itself isn't being backed up, but the virtual servers running on it are backed up to Azure. (We only need the physical server to last a few more months until I move the last remaining app server to the cloud, and switch all our users/machines to Azure AD from on premise AD). The server's RAID controller is a Smart Array P420i Controller.
Yesterday, one of the 300 GB drives in the server's RAID 5 array (there's three drives in the array in total) started to alternately flash green and amber. According to page 102 of the manual and the server's iLO interface, this drive is in a "Degraded (Predictive failure)" state.
This is literally my first time ever replacing a RAID drive on a production server, and I want to make sure I don't screw it up. As the only admin, I don't have anyone that I can ask for help.
Do I have to wait for the drive to actually fail before swapping it out? Or can I swap it out now, pre-emptively?
Can the drive simply be hot swapped out (as in push the eject button, pull it out, and pop the new drive in)? Will the RAID array begin to rebuild automatically, or do I need to tell the controller/Windows about the existence of the new drive?
Is there any risk/benefit to cold swapping the drive instead? The server technically doesn't need to stay up during off hours, so I could stay behind to cold swap it. BUT, this answer says that there's a danger to cold swapping and "that this must be done while the system is running"... It's an older server model, but I don't understand why there would be a problem cold swapping.
I've read about additional drives failing when trying to rebuild a RAID 5 array. Since this drive technically isn't failed, but is only "predicted to fail", does this in any way lessen the likelihood of another drive failing (since if they were to fail soon, they would be in the same state as this one, and not in a healthy state)? This is more for my own peace of mind lol...
Thanks for all your help!