We have a Microsoft Failover Cluster with dynamic disks managed by Veritas Storage Foundation. Today the sysadmins added a new disk for SQL Server but the cluster size on the volume was wrong, so I issued a quick format to change it.
The disk volume failed, the SQL Server group failed as well and the cluster became unresponsive. After some minutes I managed to fail over to a passive node.
The SAN admins say it's my fault because I shouldn't have formatted the disk from the Windows format applet, but I should have used Veritas Enterprise Administrator instead.
Can a format operation bring offline a whole cluster group this way?
Relevant error messages:
From the eventlog:
The cluster resource host subsystem (RHS) stopped unexpectedly.
An attempt will be made to restart it. This is usually due to a
problem in a resource DLL. Please determine which resource DLL is
causing the issue and report the problem to the resource vendor.
From the cluster.log
ERR [RCM] rcm::RcmResControl::DoResourceControl:
ERROR_RESOURCE_CALL_TIMED_OUT(5910)' because of 'Control(STORAGE_GET_DISK_INFO_EX)
to resource 'NameOfTheDiskGroup' timed out.'
Veritas Documentation:
Excerpt from Symantec's documentation:
Note: Before manually creating the resource, you must format the cluster-shared volume with NTFS using the VEA GUI and mount it on the node where you are trying to create the resource.
Does this mean the disk cannot be formatted from Windows? I don't read it that way.
For the record, I formatted many disks using the Windows applet in the past and nothing bad happened.