0

I have 2 servers on the same switch. I'm losing 5% of packets on ~16k pings between the two.

Below is my nasty ASCII diagram of the configuration of the network, all machines have a single interface.

 a       b
 |       |
 -- S1 --
      |
     S2
      |
     S3
      |
      c

a = Sun Netra 240
b = Dell 2950
c = my machine
S1 - S3 = 3 x Cisco Catalyst 2960G


pings from a -> b lose 5% data
pings from b -> a lose 5% data
pings from c -> a lose 0 data
pings from c -> b lose 0 data

I can't think of a reason that I'd lose packets going between ports on the same switch, when I didn't lose data coming from a different switch but still using the same port.

Can anyone throw any ideas my way please?
Thanks

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
chewy_fruit_loop
  • 49
  • 1
  • 2
  • 6

6 Answers6

1

Do you get any loss if you ping using the default packet size? How about if you ping using ping -l 1472? How about when pinging using ping -l 1473?

Try pinging from C to A, C to B, A to B, and B to A using ping -l 1473 -f and post the results of each of them here.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
1

Another troubleshooting step would be to plug both machines into a different switch to see if the problem moves with the devices. My guess would be that you either have an interference problem as entens suggests, or one of those boxes is load bound and dropping packets.

Greeblesnort
  • 1,739
  • 8
  • 10
  • i've already switched the ports on the current switch, which sorted the problem out for a few hours. the boxes are being patched up to date and being left to run until next week. if theres still a problem then i'm moving one to a different switch – chewy_fruit_loop Sep 17 '09 at 20:30
1

NIC Driver? duplex settings? any errors showing up on the switches? What are you using to measure the loss? ping?

Also, try disabling any offloading(checksum offloading etc) on the NIC if enabled, so you can use wireshark to find out what kind of traffic you lose.

Hope that gives you some ideas.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
MrTimpi
  • 445
  • 4
  • 11
  • the ping loss is being measured on the sun box using ping -s to gather statistics. the other pings have statistics on by default. once you stop the ping, it tells you how many packets etc and what percentage was lost. – chewy_fruit_loop Sep 17 '09 at 20:35
0

We have encountered cases where having the swicth port and/or the NIC set to Auto speed and/or auto duplex results in loss. Changing to set speed and duplex from Auto resolved the issue.

Dave M
  • 4,494
  • 21
  • 30
  • 30
0

Check the NIC\CAT Cables also is there any other network transfer traffic in the background?

Anicho
  • 275
  • 2
  • 5
  • 11
  • i've already stripped out all the cat5 that was plugged into them and replaced it with new cat6 yes there is other network traffic going on. the reason we're investigating the problem is that the server is timing out when trying to communicate with the NIS master on the other side of the atlantic – chewy_fruit_loop Sep 17 '09 at 20:28
  • Is it possible to test it in a sand box(test environment) just the two servers communicating with each other and connected to nothing else, test that if you can and see what it returns. – Anicho Sep 18 '09 at 08:56
0

it "looks" like the problem was port 0 on the nic in the sun box. we've transfered all the traffic to port 1 and the problem has vanished.

i'm not holding my breath though, this is the second time this year that this has happened. i had a bad feeling about the box when i found out that it had been end of lifed, 3 months after we bought it, had a memory failure 2 weeks before the end of the first year, and the boss won't pay for a service contract on it but prefers a case by case payment.

thanks to everyone who suggested courses of action

chewy_fruit_loop
  • 49
  • 1
  • 2
  • 6