51

Update: EMC has dropped our warranty and support, so this is going to be an insurance case. Dell says's that we can get a professional cleaning agency to refurbish the servers and keep our warranty. Cisco says "maybe". HP is still silent :(

Final update: EMC turned around and approved cleaning from a certified company. The VNX got shipped back to us today and works just fine. The rest of the server room is also getting cleaned, and our losses are limited to a couple of tape drives. The insurance company picks up the bill for just about everything else.

The original question:

Here's the story..

The owners of the building we lease office space from decided to do a renovation of the exterior. This involved in some pretty heavy work at the level where our server room is, including exchanging windows wich are fit inside a concrete wall.

My red alert went off when I heard that they were going to do the same thing with our server room (yes, our server room has a window. We're a small shop with 3 racks. The window is secured with steel bars.) I explicity told the contractor that they need to put up a temporarily wall between our racks and the original wall - and to make sure that the temporary wall is 100 % air and water-tight. They promised to do so.

The temporary wall has a small door in it, so that workers can go in/out through the day (through our server room, wich was the only option....). On several occasions I could find the small door half-way shut while working evenings/nights. I locked the door, and thought that they would hopefully get the point soon and keep the door shut. I even gave a electrician a mouthful when I saw that he didn't close the door properly.

By this point - I bet that most of you get a picture of what happened. Yes, they probably left the door open while drilling in the concrete.

I present you our 4 weeks old EMC VNX: VNX disks VNX rack

I'll even put in a little bonus, here is the APC UPS one rack further away from the temporary wall. See the nice little landing strip from my finger? UPS

What should I do? The only thing that comes to mind is to either call all our suppliers (EMC, HP, Dell, Cisco) and get them to send technicians to check out all the gear in the server room, or get some kind of certified 3rd-party consulant to check all of it. Would you run production systems on this gear? How long?

I should also note that our aircondition isn't exactly enterprise-grade, given the nature of our small room. It's just a single inverter, wich have failed one time before I started working here (failed inverters usually leads to water dripping out).

pauska
  • 19,532
  • 4
  • 55
  • 75
  • 9
    sadly, i've known production-ish systems run in much worse conditions (think warehouses). But its not ideal by any stretch, esp for new gear. I'd investigate getting some sort of compensation for the clean up at least, and perhaps for the inevitable few fan fails. It largely depends on your air flow though, and how much each unit of equipment is drawing through it. At a minimum, clean the room, then the air filters or air intakes on the equipment, and keep a good eye on temperatures and cooling. – Sirex Nov 22 '11 at 08:52
  • Yes - while not a good thing, I've seen much worse and on more critical and more expensive kit. So, do get it sorted but I wouldn't expect an impending failure. – Dan Nov 22 '11 at 09:05
  • To bad to read that EMC has dropped warranty, the other manufacturers will do the same I bet... Don't be affraid to bring a lawyer into this... – HTDutchy Nov 22 '11 at 14:03
  • 1
    On the contrary: Be afraid to bring the lawyers into this. Get your evidence together. Yes, get professional advice. We had serious problem once, in which a lawyer told us we had tangible damages. Turned out we hadnt, and after the lawsuit against the vendor, which we lost, we had a lawsuit against our lawyer, which was mitigated out of court. Be very very careful of what damages really exist, and who is responsible. If you find a trustworthy lawyer (haw), they will tell you what efforts the lawsuit requires and what your chances are. If they say it´s 100% clear, they lie. – Posipiet Nov 22 '11 at 14:23
  • 1
    We actually have a lawyer on payroll here, so I think we're covered in that department. Thanks for the gotchas! – pauska Nov 22 '11 at 14:29
  • 1
    Those poor babies! – Ben Brocka Nov 22 '11 at 15:05

2 Answers2

29

First: When the server room becomes a building site, you must remove all the servers, for several reasons.

  • power outages
  • cooling outages
  • access (you don't know who walks past your servers, on what hour)
  • mechanical stress due to building machinery
  • direct damage by workers who go bump
  • residual dust from the building site, especially metallic or mineral dust

The only safe bet is, to remove your servers to a different room. This means additional shutdowns, but during the course of the building activities, you will probably have to do the shutdown anyway.

(mind you, the following is my subjective opinion) Now this has happened. The fans are covered with mineral dust. This potentially reduces their lifetime by affecting the bearings.

I would not expect the servers to massively fail, but I would expect to have a higher percentage of failures. At a fairly large customer, fan outages rose after building activities.

But then again, I would not go to great lengths to clean the systems. If the cleaning leads to damage, it is likely bigger than a failed fan. Unless, of course, if the dust contains metallic particles, which change the whole game.

What to do? Clean the room, clean the outside of the servers, reserve some cash for earlier replacements in the future.

The damage very probably is not justiciable. The part of damage that you can prove is probably not worth going to court for. You might try to get some amount of compensation out of the landlord, perhaps a reduced payment or so.

Glorfindel
  • 1,213
  • 3
  • 15
  • 22
Posipiet
  • 1,725
  • 14
  • 13
  • 1
    What we should have done isn't especially helpful, but I do agree with the points you list up. The culprit is that I don't know (yet) if there are metallic traces in the dust. The other potential danger here is failure of air-con, wich could lead to a moist air - wich is not good combined with concrete and/or mineral dust. – pauska Nov 22 '11 at 09:24
  • 13
    as the site is read by many other people, some of whom may have building work approaching, what you should have done is pretty useful info really, just a bit late for you personally. – Sirex Nov 22 '11 at 09:28
  • 7
    That's true. Sorry for being a bit trigger happy, my mood is below average right now. – pauska Nov 22 '11 at 10:41
  • 2
    I have to agree with this. We were installing new drop ceiling tiles, and during the process, the workers covered the server racks with plastic sheeting. Thankfully, I was there and the thermal warnings generated an email alert, but without that it would have been toast. Once servers become contaminated by particulate matter, it is very difficult to reverse, and the side-effects can be long lasting and result in very elusive issues. – Greg Askew Nov 22 '11 at 21:56
23
  1. Get all the technicians in.
  2. Make them check/clean all the equipment.
  3. Send the bill to the building planner.

Really, servers can withstand some level of dust but this is just too much. We clean our servers regularly during downtime with a PC vacuum by 3M. It's a nice thing to have around the office.

But for now, start cleaning. The faster you get the dust out of there, the better. Try to keep heatsinks and fans clear of dust. If a heatsink or fan is covered in dust, its ability to dissipate heat is much worse then a clean unit.

Bart De Vos
  • 17,761
  • 6
  • 62
  • 81