An unknown tool is wiping our Virtual Machines and we can't ID it

Question

A console view of a Windows 2008 R2 VM, on vSphere is showing the following screen:

Screengrab of program

"Operation 2 of 2" "Wiping disk"

Can someone advise on what this program is?

Some information on this mystery:

A number of VMs are now effected. The symptom is after reboot "OS not found" message is appearing.

VM's are running on ESXi. VM's are running on a particular datastore
Netapp NFS Mounting the disk in a working box shows no partition table, have not yet been able to hex dump.
VM was not hard reset, would have to be an OS initiated soft reset
There is NO iso mounted There was no "non guest" access to VM, so would need to be RDP or similar
Backups are performed using netapp backup software over-night
NFS in question is thin-provisioned on the back end (array level), and ran out of space just after we saw these issues.

Have you confirmed that there's no PXE server configured anywhere that could be doing this? — Dan, Jul 24 '14 at 11:38
@DAN no PXE is picked up when VM restarts - hence the "no os found" unless it is a very targeted pxe setup. Also, NFS running out of storage /MAY/ be caused by a full disk write of this tool — Rqomey, Jul 24 '14 at 11:40
@JamesRyan Thanks, I believe mounting an iso would show as a VM re-configuration in the events log, and no such events are present. I'll check now however (just checked, mounting an iso shows as "reconfigure virtual machine" in tasks — Rqomey, Jul 24 '14 at 12:05
Could still be someone running the tool remotely through RDP. without you noticing. — MichelZ, Jul 24 '14 at 12:24
Is this limited to your Windows VMs, or all those the only VMs you have on this host? — MDMoore313, Jul 24 '14 at 12:25
@MichelZ Yes, everything points to an in-guest operation, be it RDP, remote execution. I have most visibility of the VMware layer, currently. — Rqomey, Jul 24 '14 at 12:44
Can you break out of the software or use any of the common Windows shortcut keys to get more info? — HopelessN00b, Jul 24 '14 at 12:51
@HopelessN00b Seemed to be unresponsive, although not much time was spent on it as a load of other VMs were reporting no os found. Looks like it was point in time, and has since finished. — Rqomey, Jul 24 '14 at 12:59
Purely based on the design of the window, the strings contained in it, as a handful of similar screenshots, it looks like the tool is something built by Acronis. Here is an example of a tool Acronis built [for Seagate](http://support.seagate.com/kbimg/flash/laptop/Laptop.swf) (click "Next" a few times to see it) that looks very similar. — Moshe Katz, Jul 24 '14 at 19:36
I've seen a similar ui layout in Acronis Disc Director. Apparently it has a "clean up disk" feature (googled it), which I've never used. It appears to be running on your guest. You configure it via GUI (maybe it also has a command line exe) and this stuff happens upon reboot. — Daniel F, Jul 28 '14 at 20:07
Could you Check if you have a older snapshot in time. Definitely a Windows based Disk backup software and if this system is administered you could check last softwares installed. — user3281051, Jul 31 '14 at 01:36
Have you looked into Group Policy yet? Do you have any other systems management systems in place like SCCM? Between those two, a simple mis-configuration could do a lot of damage. — Will, Aug 03 '14 at 17:32
Hi All, Thank you very much for the feedback. Unfortunately I received little further information on what happened other than it was a "user initiated disk wipe". No idea who what or why, but I think @MosheKatz found the closest match — Rqomey, Aug 04 '14 at 09:51

score 10 · Accepted Answer · answered Aug 04 '14 at 11:00

Unfortunately it looks like we may not get to the bottom of what the application was, but to get some value from this incident, I wanted to create a reference answer. This is VMware and virtual layer management centric. A lot of admins are in segregated, and cannot get guest or storage access quickly, and this is for them :)

http://support.seagate.com/kbimg/flash/laptop/Laptop.swf seems to be the closest match to an actual application, which @MosheKatz found.

If this happened in the future, the investigation should be follows like so:

You notice some but not all VMs have crashed. You suspect this is due to a storage issue (as it usually the most likely cause)
First try to isolate a common factor. Are all the crashed VMs sharing the same datastore? In this case they were, but some Machines were ok, so we ruled out obvious hardware issues.
Check all broken VMs to see if there was a common factor (time, function etc). In this case there wasn't.
Check for other unusual events. Something raised a flag here:
- The NFS storage was thin-backed (on the array level). This means that although eg. 200GB is presented to the ESXi hosts, in fact only 100GB is available. Only the array has this knowledge however. What we found was a number of VMs were paused as they had run out of disk space. We though this may have been the root cause, so our fist action was to allocate more storage on the back end, to remove this as a problem.
Once this was resolved (a simple UI change), and the paused VMs were restarting successfully, we returned to the original issue. We mounted the virtual disks from the broken VMs to a working VM, and saw that there was no partition table on the disks. We didn't have a hex viewer available, so had to assume disks were now empty.
The monitoring system alerted to a new VM which just went unresponsive. This was great, as a load of VM's had minutes before just turned un-responsive due to the disk space issue, so the fact this new VM was found quickly was a sign of good monitoring administration.
We opened a console and checked the guest, and saw the above screen-grab.
- At this stage I went to the server fault chat room to see if the program could be identified, while my storage colleague checked all virtual layer logs and events, to make sure there was no storage operation running from our area.
What we should have done was suspend the VM, allow the suspend file to get written out, and analyse the dump to see if the running program could be identified. Suspend VM to core PDF VMware KB

At the end of the day, we knew and Virtual infrastructure tools would not have reported within a guest like the above was doing. We could see there was no ISO mounted, and no events logged against the VM. We could see the VM wasn't "hard power cycled", only a soft restart (this is invisible to underlying infrastructure). We knew it wasn't storage side as we had ruled that out already. We suspected it wasn't automated as it was happening over the course of a few hours on specific VMs. We guessed it wasn't malicious as why would the console report Disk Wipe if it was :)

So, the conclusion was a user initiated disk wipe. That's as far as my investigation went, but I hope you found it useful.

Lessons Learnt:

Backup and test your restores
Make sure all users, particularity admin users, know they are working in a thin provisioned environment, and should avoid anything like write-out disk formatting (ie. write loads of 1's
Have a good monitoring system in place.
And a new one for me: In any large virtual environment, have a tools VM ready, even powered off, with diagnostics tools installed; performance, network storage. If this was available we could have mounted and performed a hex dump on the damaged disk to see if it was really empty, or just missing a mbr. We could have also seen if it was written out with 1's.

score -1 · Answer 2 · edited Aug 01 '14 at 16:55

-1

I think your issue is a standard VMware space reclamation feature.

This article may help you: Clearing up Space-Efficient Virtual Disk questions

edited Aug 01 '14 at 16:55

grekasius

2,046
11
15

answered Aug 01 '14 at 15:26

doc

1

Hi @Doc,Thanks for the feedback, but it isn't. This is an in-guest operation, un-maps etc should be non-destructive, and wont be reported through a console window in that manner – Rqomey Aug 04 '14 at 09:50

An unknown tool is wiping our Virtual Machines and we can't ID it

2 Answers2