1

We recently switched out our Cisco 6500 core switch for a pair of Dell PowerConnect 6248 stacks. Since then, our Network Load Balanced Sharepoint, which runs on two virtual machines on an ESX cluster has been behaving very poorly. The symptoms are that opening and saving documents stored in sharepoint takes a very very long time. There are no errors showing up on the Sharepoint servers or SQL server, just a lot of annoyed users. Initially I thought there was no way NLB could cause this, but as soon as we repointed the DNS records for our intranet to the ip address of one of the web front ends, the problems disappeared.

We suspect there is an issue related to multicast in the Dell configs - NLB is configured for multicast, but not IGMP.

Has anyone got a similar set up to us and fixed this sort of issue? Sharepoint on VMware ESX, with Dell PowerConnect switches.

dunxd
  • 9,482
  • 21
  • 80
  • 117
  • I also noticed that under a very small load, the problems with NLB do not occur (e.g. I set my hosts file to point our intranet urls at the NLB address, and I don't see the problem - but when everyone goes to the sharepoint servers via the NLB address, things are very slow) – dunxd Apr 15 '10 at 17:29
  • Dell are getting back to me with recommended config to get NLB working with their kit - it seems the documentation does exist, but it is buried on Dell's site. Will share resolution if I get it. – dunxd Apr 26 '10 at 11:54
  • Still in touch with Dell on this - they've been working out what the issue is. Latest suggestion is to go to the latest firmware (isn't it always). Explanation is that the newest firmware shifts a lot of Layer 2 processing out of software and onto hardware on the PowerConnects. Seems reasonable, but these are our core switches, so not sure when I am going to get a window to do a firmware upgrade. If only we were properly redundant like we planned :-) – dunxd Jul 08 '10 at 22:50
  • I was wondering if anyone had heard anything additional about NLB with Dell PowerConnect Switches? We've had the same issues discussed here with ISA servers in an NLB array. We have them connected to separate layer 2 switches that then connect to our layer 3 switches, and mentioned above. When we start NLB, everything slows to a crawl. I'm at a loss to explain what is going on... –  Nov 03 '10 at 19:57

3 Answers3

1

We have seen almost the same issue. We are using NLB with multicast (but not IGMP) to load balance 14 web servers across two ESX 4 servers plugged in a pair of stacked Dell PowerConnect6248. The nlb was working but the performance was terrible. We tried chnaging everying on nlb (unicast, multicast, igmp) and vmware switch (promicous, nitify switch, etc) and could not make it work. We added multicast MAC to dell bridge and arp tables all to no effect. We eventually solved it by turning routing off the vlan on the PowerConnect (i.e. using simple layer 2 VLAN) and using an external router to route traffic. Would love to know how to using the routing on Dell to make this work as it should be supported.

  • The problem, IMHO, is not the switch but the NLB. We had similar problems with our TS NLB. There was so much NLB heartbeat traffic that it was smothering the rest of the network. We solved it by using 2 NIC's on each TS (which is MS best practice BTW) and connecting the NLB bound NIC to an isolated layer 3 switch and routing traffic for the backend through the second NIC, which was connected to the production network. – joeqwerty Apr 23 '10 at 12:23
1

All sounds very familiar. I've got exactly the same problem. NLB on Exchange and Sharepoint on a set of ESX VMs, any time there is traffic to the NLB, it grinds to a halt. We have worked closely with Dell, and the problem is Multicast. Supposedly there is a Dell White Paper on this, which says you must use Unicast and not Multicast.

Now we are waiting to move our NLBs over to Unicast. We have 30 odd of these switches, all running 3.2.0.7 now. The v3 firmware was a big improvement, but be careful if you are upgrading from v2 and make sure you read their instructions, it's not a simple install and reboot. Also, some things are configured in different ways, such as DHCP relay. And it massively broke our NLB to start with.

If you're not convinced, try pinging the management interface (something graphical like PingPlotter) whilst monitoring the traffic to the NLB. You will see the ping time is linked to the amount of traffic. We go from pings of 1ms to over 200ms, and even dropping packets. The management interface locks up as the switch processor is handling the Multicast, rather than being done in hardware.

Hope that helps, I'll post back when we eventually move over.

delluser
  • 11
  • 1
  • Dell also mentioned a white paper, but never sent it through. I'm not convinced it exists. Can't understand why they haven't got this working with documentation since all the competition has it clearly and publicly documented. – dunxd Nov 21 '10 at 23:55
0

Some Dell switches are not supporting Multicast NLB. This is why you are having a performance issue. Also you will see excessively CPU usage. You can see more about this at this link.

http://www.dell.com/us/business/p/powerconnect-6200-series/pd

Another case about management network ping loss is related with firmware revision. New firmwares solves that problem. I suggest you update your firmware level.

slm
  • 7,355
  • 16
  • 54
  • 72
isso
  • 1