11

We currently have five 24 port 3com unmanaged switches. Four of them are 100Mb switches with no gigabit port (MDI Uplink port is 100Mb). The fifth is a full gigabit switch which we use for all of our servers. They are daisy chained together at the moment, kind of like this:

alt text

At certain times of the day, network performance becomes atrocious. We've verified that this has nothing to do with server capacity, so we're left with considering network I/O bottlenecks. We were hoping to connect each 100Mb switch directly to the gigabit switch so that each user would only be 1 switch away from the server ... like this:

alt text

The moment this was connected, all network traffic stopped. Is this the wrong way to do it? We verified that no loops exist, and verified that we didn't have any crossover cables (all switches have Auto-MDI). Power cycling the 5 switches didn't do anything either.

Beep beep
  • 1,843
  • 2
  • 18
  • 33
  • Buying, Borrowing, Begging or Stealing a single inexpensive managed switch for the "full gigabit" role is really not an option? A few hundred buy you eg a HP JE006A (which is 3com-designed btw ) nowadays, and you will have much better ways to debug when that problem repeats (and since it just appeared without taking an effort to make it appear, it probably will). – rackandboneman May 18 '12 at 15:15

7 Answers7

17

Maybe the switches were just in awe of your amazing topology?

Seriously though, if you can take the time, do it again, but only connect one switch at a time to the Gb switch, and verify connectivity and function at each step.

When the network finally halts, disconnect all of the switches, and try connecting that switch first. If it does it again, disconnect everything on that switch and add them back, one by one to find the unhappy link.

If it doesn't die, add switches until it does. If it fails after adding more switches, maybe one of the switches can't handle the size of the MAC address table, and you need a bigger switch?

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • 3
    Also if it does fail again, watch the activity lights on the switches - are they going nuts? – Zypher Dec 17 '09 at 04:24
  • @Zypher - good point, we'll take a look off hours (during the day they're always going nuts, we have a lot of network traffic). – Beep beep Dec 17 '09 at 04:55
  • 2
    Not having managed switches makes these kind of issues difficult to track down as you lack visibility of the network. But Matt's approach should help you narrow it down quickly. – 3dinfluence Dec 17 '09 at 05:51
  • 1
    @Matt: minor point, switches do not have ARP tables. Unless they are L3 switches, in which case they should be called routers. :) – Murali Suriar Dec 17 '09 at 10:02
  • D'oh! Thanks! I fixed it. Pshaw, who needs that seven layer model ;-) – Matt Simmons Dec 18 '09 at 01:51
  • 2
    Matt's approach is a good one, as is your change in topology (daisy-chain worst possible). There's various other obscure things that could be causing your problem, which are difficult to track down on unmanaged network gear. I've seen similar behavior with "one-way" servers - like a locked-down syslog server, which only receives logs but never speaks. That causes its MAC address to time out on the switch's table, and all traffic to that server gets flooded through all ports. Similar issue with clustering or VRRP... – Geoff Dec 22 '09 at 21:48
  • @Murali Suriar: Switches maintain a MAC Table that maps MAC Addresses to switch ports. This is the mechanism by which they forward packets to only the required ports and is the somewhat defining difference between switches and hubs. Not repeating all traffic on all ports significantly improves performance. You are correct that it is not an ARP table as ARP resolves IPs (which L2 switches don't care about). Switches learn MACs as traffic passes through them. – Kevin Colby May 18 '12 at 15:15
3

I'd definitely try to get the latter topology working (with Matt Simmons' diagnostic hints as a guide), as it will provide better latency (always only one hop away). Also, given your very limited interconnect capacity, try to:

  • put machines which often talk to each other on the same switch (so that the interconnects don't have to carry as much traffic); and
  • hook machines that have gigabit NICs and talk to a lot of other machines on the gigabit switch directly (so they can get their data into the network as efficiently as possible).

Ultimately, though, you can't stick a ten pound turkey in a five pound bag, and I'd be writing up a plan for purchasing a set of managed gigabit switches, based on your description of the network being heavily utilised already.

womble
  • 95,029
  • 29
  • 173
  • 228
  • "you can't stick a ten pound turkey in a five pound bag" - nice. But what if we duct tape two five pound bags together? That's worked so far =) – Beep beep Dec 17 '09 at 06:54
  • 1
    I guess that etherchannel would be the equivalent of duct tape... requires managed switches, though. – womble Dec 17 '09 at 07:27
0

The fact that you are daisy chaining switches together is the source of your problem right there. I would advise looking into at the very least purchasing managed switches. A good approach for a small business and also a cost effective one is to implement a collapsed core design with vlans. This way you can segment broadcast/collision domains and control the performance of your network.

Lee
  • 1
0

Looks like an arp storm when multiple unmanaged switch are connected in ring like topology or similar.

REASON: Broadcast arp on all ports will resolve the path, but some of the own arp will bounce back causing a storm. At such situations link speeds saturate to 100% on all ports and traffic halts. Use managed switches to solve this issue.

0

Unmanaged switches have a limit to how much traffic they can switch effectively - not nearly what you would expect. You are probably getting a broadcast storm/bus contention causing things to go bad because too many packets or clients are seeking to use the network at one time. Switching to managed switches (we use the netgear 24 port ones that run about $250-300 a pop) will give you more switching capacity and the ability to QOS ports if you need to manage priority better.

Emperically we tried the bottom layout one time and it just went bad. I am sure a real CNE will tell us the real reason but I have always stuck with the top config simply to keep the wiring simple, clean and working. Mind you we could have simply had a circular loop and not noticed it at the time.

MikeJ
  • 1,381
  • 4
  • 13
  • 24
  • 4
    Managed switches have the same limits as unmanaged switches. Managed switches just give you more control and visibility of what's happening on the network. – 3dinfluence Dec 17 '09 at 05:38
  • 2
    @3d: While you're correct that there's no intrinsic difference between the capacity of managed versus unmanaged switches, in practice managed switches tend towards the higher quality end of the market, and hence tend to have better switching fabrics, allowing them to fling more traffic around. – womble Dec 17 '09 at 06:26
  • @womble...no doubt but pretty much all unmanaged switches are "non blocking" at least as far as the backplane is concerned. Whether or not the cpu can keep up with fragmenting packets and other tcp issues is another question. Managed switches also tend to have more cache on each port than their unmanaged brothers. – 3dinfluence Dec 17 '09 at 14:49
0

A couple more ideas on things could be going on:

  1. Autonegotiation may be picking inappropriate speed/duplex settings. Make sure to set the ports manually for 100/Full
  2. A Spanning-Tree reconvergence is no doubt occurring when you make the change. This can take quite a while to finish, depending on your various timers. If you didn't wait at least 2 minutes, that may have have impacted your tests.
Wade Williams
  • 178
  • 1
  • 5
  • 1
    Oddly enough, if we set clients to 100/Full or 100/Half, many of them slow down significantly. The only way it continues to send faster than 10Mb is by setting it to auto-negotiate. Makes no sense whatsoever, but many of our clients are Win98 (legacy app that we cannot replace for another year), so we don't mess with it. – Beep beep Dec 17 '09 at 06:52
  • I was talking about the connections between the switches being manually configured for 100/Full. If two switches from the same manufacturer won't work when a link is manually configured, something is very wrong and I'd be on the phone with their tech support. – Wade Williams Dec 17 '09 at 07:09
  • Ah. These are unmanaged switches, and don't allow changing the port speed. – Beep beep Dec 17 '09 at 17:23
  • Actually, that behavior makes perfect sense, and is in spec. If one side (the switch) is set to auto, and the other is hardcoded to Full/100, the switch will try to negotiate and fail, since the client doesn't negotiate back. To get Full/100 working, either auto on both sides, or hardcode on both sides, don't mix auto with hardcoded. – Geoff Dec 22 '09 at 21:40
-2

Something to consider: Are you using crossover or straight-through cables? You should always use cross-over cables when connecting like devices. Using a straight-through cable in a normal port will slow your connected switch to a crawl unless the switch is designed to auto-correct for the wiring difference. (Some Do).

specba
  • 1
  • 3
    Just plain wrong. If the port is not auto-sensing, using a straight cable will *not work at all*. And if it instead *is* auto-sensing, it will not make any difference. – Massimo Jul 30 '12 at 21:49
  • I was going to disagree with the -2 since the question is 2009, but the answer coming in 2012? I agree with the votes. :P Now-a-days, it doesn't matter if cross-wired or not. – Krista K Jan 06 '14 at 10:23