0

We have a shinny new multihomed Windows Server 2008 (64 bit) cluster exibiting some strange behavior.

The problem:

  • Everything works perfectly until we failover one of the cluster groups

  • Prior to a failover, internal clients can connect as well as external clients. And, all domain authentication works properly

  • Once we failover a cluster group, Internal clients in different subnets loose connectivity (as if the static routes had disappeared) and you can no longer log into the server using a domain account (Domain Controller is in different subnet)

  • All DNS lookups occur via the Public/Internet interface. It is as if the server(s) can no longer find/resolve the Internal/Domain DNS servers.

  • Rebooting fixes the problem until the next group failover

  • Setting the default gateway to the Internal network also works, at the extreme consequence of having to make static routes for the entire Internet (I don't have the time)

The network adapters are as follows:

  • Heartbeat Network (crossover cable between two servers)

  • Internal Network (Active Directory based Network w/ DNS no WINS)

  • Public Network (Internet Connection - Default Gateway - w/ DNS)

  • Microsoft Cluster Failover Virtual Adapter (this is hidden in most cases but you can see it when you do an "ipconfig /all")

Other information:

  • This system must provide services to both the Internal and Public networks

  • The Public/Internet connection is the default gateway

  • We have entered persistent static routes to several subnets off the Internal network

  • Each cluster group has a network name and associated IP address

  • The binding order of the network interfaces are:

    1 Internal

    2 Public

    3 Heartbeat

We're stumnped. We have used this configuration on older clustered Windows 2K clusters. We have also used this configuratin in standalone Windows 2K3 servers. Any suggestions would be greatly appreciated.

Todd

Todd
  • 243
  • 1
  • 5
  • 8
  • Did you find a solution to this? We are experiencing a very similar issue on a Win2008/Exchange2007 CCR environment. – Trondh Sep 06 '09 at 01:07
  • We have found a solution as of last Friday. I will post the solution on Tuesday when I return to work. This was a real problem and I'm surprised more people haven't run into it. – Todd Sep 07 '09 at 19:18

2 Answers2

0

I think I have this exact same problem on a new 2008 R2 cluster with an equallogic, what is the solution? I have a microsoft case and they're pointing me to weak/strong host but it is not helping.

Here is solution for anything with broadcom NICs (and maybe others):

http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037

You must disable rss/chimney/netdma. Resolved my problems immediately, after dell/ms support calls!

0

The following post on technet by John Marlin, Senior Support Escalation Engineer at Microsoft, was exactly what was happening and provides the solution.

He described the problem as:

"The issue is that when you add a static persistent route to a network adapter that is on a Windows Server 2008 Failover Cluster and take a Clustered IP Address offline (or move it to another node), the “Active” route is removed and no connections can be made using this route even though it still shows as persistent. Once you bring the Clustered IP Address back online, the active route is returned."

We followed his advice and things started working! We did have some additional DNS problems, but those were easier to solve. Windows Server 2008, when clustered, is really a different beast from a network perspective than previous versions.

Note: We also had lots of problems with applications binding to virtual cluster failover adapter/address and other issues with multicast/udp traffic and the windows firewall, but that is for another post.

Todd
  • 243
  • 1
  • 5
  • 8