We have a flat office network tree built on a number of different ProCurve L2 and L3 GigE switches that spans some 300 ports. Today I found that one of the devices in the network for a short period of time causes excessive broadcast that causes saturation on most 100Mbps links affecting certain services such as VoIP. The device is connected to the ProCurve 3500yl switch which is the root switch of the network and thus the storm spills via the root switch to the rest of the network.
Q: Is there a way to localize the problem and avoid the storm spilling through the root switch?
Here are some more specifics of my case that might be relevant since I may be asking a wrong question and the best solution might lie elsewhere.
The device that causes the storm is itself a ProCurve 3400cl (J4905A) PoE switch with an outdated firmware version M.10.76
from 2009. I know it is old, will flash the newest over the weekend.
The 3400cl is connected to a power source that has intermittent extended outages. When the power resumes after an outage the device takes about 5 minutes to boot. At this time the traffic flows through the device while the device and its links are not yet completely set up. During this time it spews into the network all sorts of undesired traffic that is hard to capture but that leaves a peak in the statistics collected over SNMP.
During this time I see High collision or drop rate. See help.
messages on many 100Mbps ports on the network.
The 3400cl is connected by two physical GigE links to 3500yl. The 3400cl is running RSTP while the 3500yl is configured with MSTP spanning tree protocol. During normal operation one of the links is disabled by RSTP on 3400cl while the other is forwarding.
When 3400cl reboots I can see the following messages in the logs of 3500yl
14:05:03 ... port 37 is now off-line
14:05:04 ... port 38 is now off-line
14:05:51 ... port 37 is blocked by STP
14:05:51 ... port 38 is blocked by STP
14:05:54 ... port 37 is now on-line
14:05:54 ... port 38 is now on-line
and then I am seeing High collision or drop rate
on 100Mbps ports connected to 3500yl and the lower level switches connected to it.
14:07:11 ... port NN-High collision or drop rate. See help.
Also the VoIP users are experiencing interruptions.
The only immediate measure I could attempt was to set broadcast-limit 5
on the 3500yl pair of ports. I am not sure and could not test if it will help. Also it feels very much like an ad-hoc solution.