Unexpected shutdown and data loss

Question

We have several rack servers, all with SATA3 Transcend SSD's. Most SSD's are 250GB and they usually have 50% free space most of the time.

When there are unexpected shutdowns, not all the data is properly written to disk, so we can't retreive the data anymore.

The amount of data we write is more or less 8000 rows every 10 seconds. Obviously that the lower the number of rows, the higher the chance for all of them to be written properly. We passed from HDD to SSD for this reason, the maximum amount of rows for a HDD to be all written prperly was around 700, above that quantity the data would be unrecoverable. The SSD's rised this limit to around 5000-6000, but it is not enough.

Is there any solution for this? All the data comes from Ethernet port, I mean there is no data generated locally.

What's causing the unexpected shutdowns? **That** is the question you should be seeking resolution for. — EEAA, May 02 '17 at 14:02
@EEAA the racks are in our clients environment, we can't say that to the client. — u1236645, May 02 '17 at 14:59
So provide decent infrastructure for your product, then. Purchase a high-quality UPS to install and proper battery-backed storage controllers. — EEAA, May 02 '17 at 15:00
@EEAA they do have UPS, but most of the time they are running out of battery due to the conditions, sometimes servers keep restarting over and over for minutes and thats really harsh for the application. We can neither provide "decent infrastructure" if the client is not willing to pay for it. — u1236645, May 02 '17 at 15:09
@u1236645 I don't think this issue is yours to fix. You cannot ensure data is written without power going to the devices, it's simply not possible. As others have said, it's the client's job to fix the unexpected shutdowns. — Stese, May 02 '17 at 16:06
Another way. Try to use HA cluster on two servers. Each server must be powered its own UPS and UPS must be powered from different power sources or different power phases. Use `nut` to control your UPS. — Mikhail Khirgiy, May 02 '17 at 19:35
Using UPS within your setup will be a good option, I agree with others. You can also replicate the data between your servers to be sure the data is safe. — Strepsils, May 03 '17 at 16:18

mzhaase · Answer 1 · 2017-05-03T09:01:11.377

5

Why are your servers crashing so often?

In case of power failure:

Get a RAID controller with backup battery / capacitor
Get SSDs with build in capacitors, so they have time to write their cache
Get a UPS to prevent power failures from shutting down servers

//EDIT// As EEAA pointed out below, also connect the UPS to the servers so it can trigger graceful shutdown. Every major UPS provider has solutions for this.

In case of hardware failure get better and/or redundant hardware.

In case of OS crash fix the error resulting in the crash.

edited May 03 '17 at 09:01

answered May 02 '17 at 14:06

mzhaase

3,778
2
19
32

1

"Get a UPS to prevent power failures from shutting down servers" It's worth mentioning that unless you have ops staff monitoring things 24x7 and available to shut down infrastructure, the UPS *should* be configured to do a graceful shutdown of the system. – EEAA May 02 '17 at 15:01
@EEAA thats a good point, I add it to the answer. – mzhaase May 03 '17 at 08:59

Unexpected shutdown and data loss

1 Answers1