I am running a cluster consisting of 22 nodes. (22 nodes under the same 1Gbps switch.)
I noticed some nodes in the cluster has higher "frame" value in ifconfig like the following.

some nodes (higher frame):
eth0      Link encap:Ethernet  HWaddr 90:B1:1C:09:D2:F8
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::92b1:1cff:fe09:d2f8/64 Scope:Link
          RX packets:643150667 errors:0 dropped:790 overruns:0 frame:280072
          TX packets:908361364 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:377424658828 (351.5 GiB)  TX bytes:864099883266 (804.7 GiB)
          Interrupt:170 Memory:d91a0000-d91b0000

other nodes (lower frame):
eth0      Link encap:Ethernet  HWaddr 24:B6:FD:F6:DF:34
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::26b6:fdff:fef6:df34/64 Scope:Link
          RX packets:1126524649 errors:0 dropped:118 overruns:0 frame:43775
          TX packets:847071691 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:992080311726 (923.9 GiB)  TX bytes:385366462299 (358.9 GiB)
          Interrupt:170 Memory:d91a0000-d91b0000

What might be wrong with it ?

I also ran ethtool and "rxbds_empty" matches "frame" in ifconfig and "rx_discards" matches "dropped" in ifconfig.
what is rxbds_empty and rx_discards ?
I have investigated those, but there is almost no information about it.
Are they coming from something bad configuration or setting ?

The weird thing is newly added 6 nodes have that higher value.
Also, I noticed some program runs slower than before we added those 6 nodes.
What the program is doing is that every node requests huge amount of short messages to other random nodes in parallel.
Ideally, every node has the some completion time with the program, but the added 6 nodes run slower than others.

Could anyone give me any advice ? Any help will be appreciated.

Frame errors indicate some sort of CRC failures happening when node's NIC is receiving data from the switch. You should check the physical layer fist here:

  1. Test the cable using (obviously) a cable tester. It should at least comply with Cat5e standard.
  2. Check MTU on the switch (could the Jumbo frames be enabled?)
  3. Confirm that port settings are identical on both switch & node: port speed, duplex and flow-control.
  4. Check port statistics on the switch (e.g. show interface Gi0/4)
