2

Over three consecutive days, three of our servers have lost their partition tables. Two of the machines were running linux and the third was a windows machine and all of these servers are on an internal network.

It seems unlikely to be the work of a virus or code being executed but I cannot think what else it could be. It's very odd and I cannot work out a connection.

Does any one know what might be causing this? Could this be something to do with power surges?

Update

It seems that the problem does indeed stem from the use of machinery upstairs. After 3 more failures and logging of time, it appears to coincide with the times the machinery was used upstairs. Thanks for your answers.

xenon
  • 135
  • 1
  • 6
  • I don't know the answer, but i suspect whoever might will need more info. harddrive makes, ages, motherboards in use maybe, software versions, anything really. I'm guessing this is either going to be something someone just knows or no-one has a clue. – Sirex Feb 03 '12 at 10:05
  • Yeah that's the trouble. The hardware varies along with the ages of the machines although they are all 2 -3 years old. I wouldn't be sure what to list since the set up is different for each machine. It strikes me that it's an external factor and not the machines themselves. – xenon Feb 03 '12 at 10:15
  • ... 2 - 3 year old hardware is very little time for such malfunctions to occur by themselves. I'd propose that this is most likely a third party problem, probably a human factor, are u sure that there isn't some sort of a rogue admin among the stuff members operating on those servers? – Spirit Feb 03 '12 at 13:02
  • I can be fairly certain that it is not the administration team. Our team is very small and have spend a lot of time mopping up the damage. Two of these failures occurred overnight and the other during the day after SSH stopped responding and a restart revealed that it had suffered the same fate as the others. – xenon Feb 03 '12 at 15:12
  • Define "lost their partition tables". Partition tables are in the main fixed data structures with known addresses and cannot be lost. What is the observed behaviour? What is the observed content of the tables? Are any of the partition table entries wiped? How many, if so? – JdeBP Feb 03 '12 at 15:41

1 Answers1

2

Mhmmmm...

You suggest power issues yourself. No UPS I presume ?

Baring more detailed information some guesswork:

You are not stating how bad the damage is. 3x on each of the servers or 3 servers each affected once on different days ?

If the damage is strictly limited to sector 0 (==partition tables) any sort of power-surge or other external random factor is extremely unlikely. The damage would most likely be more random than that: Corruption all over the disks.

A virus would appear more likely but you say Windows and Linux are bot affected. That is too odd for a virus.

Are you absolutely certain nobody can tamper with the systems ? Through malice or ignorance. The proverbial janitor plugging in a vacuum-cleaner or floor-scrubbing machine on the same circuit ?

Tonny
  • 6,252
  • 1
  • 17
  • 31
  • A very strange situation indeed... I also agree on the last one with @Tonny.. I would add a 60% probability of a human factor in this odd scenario :/ – Spirit Feb 03 '12 at 12:55
  • are you machines secure? – The Unix Janitor Feb 03 '12 at 13:11
  • Apologies, there is no UPS and it has happened to the three machines individually. The spanning across operating systems is why I ruled out a virus also. As for someone tampering with the systems, I'm can't rule that out but it seems very unlikely as the office is locked out when out of hours and the cleaners/janitors only come in at the end of the week. Thanks for answering by the way. – xenon Feb 03 '12 at 15:05
  • We have recently had a new company move in to the offices above ours and it appears they use heavy machinery (polishing equipment I believe) from time to time. Not certain that it would affect our power though. I could be entirely wrong, my knowledge of electronics is not great. – xenon Feb 03 '12 at 15:26
  • Heavy machinery would do the trick. Without an UPS to act as buffer (it does more than only provide backup power) that may just be the cause. I had a similar issue some years ago: New soldering-bath in the factory-hall pulling 16KW. Caused a massive spike every time it was switched on or off. The main server-room UPS handled it (barely) but some of the older UPS units in patch-cabinets we had to replace by ones that had better spike-filtering abilities. – Tonny Feb 03 '12 at 16:11
  • @Tony thanks for this, at least we have something to look in to. Whilst it still could be a human factor it seems unlikely in this scenario. – xenon Feb 05 '12 at 17:17
  • 1
    the thing i dont understand, is even with dodgy power you wouldn't expect a machine to randomly wipe its partition table. I have a machine with a dodgy psu (new one on order) that freezes and reboots every few hours, the partition table is just fine. – Sirex Feb 06 '12 at 07:36
  • @Sirex: If it's power related I'm pretty sure there is more damage to the disks than just the partition tables. Still... It remains a very odd situation. I would almost say there ought to be some serious EMP issues as well. – Tonny Feb 06 '12 at 22:38
  • After attempting to repair the disk, it does indeed seem that the corruption was not localised to the partition tables (as you both said). It seems we have some power issues to look in to. – xenon Feb 07 '12 at 10:13