4

Are there any reasons why I should not rely on LACP when designing network topology? I exactly mean L2 switch to hypervisor connection, so it is the place where agregated traffic of VMs cumulates. We are talking about 5 x 1 GbE LACP bonding.

I am in disagreement with my colleague. He says: "Why we should add another layer of overhead to entire setup? It is just another potential point of failure." And he is overall sceptic about link aggregation. I have an opinion that linux bonding driver in 802.3ad mode is reliable and good choice.

He also thinks that we dont need it, because there won´t ever be such a big traffic in our environment, that simple 1 GbE will be enough. We are high school with about 100 PC clients and about 10 servers in our LAN.

So we are in situation when we exactly don´t know weather we need LACP or not. Some additional data about network traffic would be fine, but I believe it is challenging to retrieve meaningful numbers. So it is finally easier to rely on intuition and just say: "Yes, we want LACP, to be sure, because of traffic." or "No, because it is not reliable and we don´t need it."

Any suggestions?

  • Probably 5x gbe is an overkill, that's a huge amount of traffic. 2x gbe might be ideal to provide availability as well. On the other hand, you need to be sure that you have a setup where anyone that will come to manage it will have the required skills. As such, if you think that someone without the proper training and skills will manage it, I'd say dump the idea. – Florin Asăvoaie Sep 12 '16 at 11:04
  • Thank you, Florin, for Your answer! I agree with you, that 5x gbe IS probably an overkill. But the question was about using (or not using) aggregation at all. We are just two, who are managing the network. We are students, so we looking forward to learn anything new. Or did you mean administrators that will be our successors? I have that opinion, LACP, VLAN, STP and knowledgement of such protocols are assumptions to do this job. – Andy Coarse Sep 12 '16 at 11:57
  • You can either dream that everyone is going to know those or learn the hard way something that you will learn further in your career. Expect the less, something will always fail and people are always stupid. – Florin Asăvoaie Sep 12 '16 at 12:25

2 Answers2

6

To tell the true, LACP was born exactly to solve a dangerous problem itself caused by LAG (Link aggregation Group).

When used between directly attached interface, LAG is not dangerous. In such a setup, basically any network problem can be tracked back to a port with no link - which automatically instruct the switch to stop sending traffic to the disconnected port.

However, if some other device sits between the LAG-enabled switch and the aggregated Gbit ports, some other logical issues can arise, causing real problems because the forwarding switch has no information about these transient problems (it will continue to blindly send traffic to the disconnected/problematic ports).

In order to solve this problem LACP was defined: it uses an heartbeat-based system to constantly monitor the aggregated port, and automatically disconnect them when too much heartbeats are lost.

In short: if correctly configured, I see no problems in using LACP. The only thing to consider is that you inevitably have a slight more complex configuration to track/manage.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • Thanks for the answer! Is there some way how to reliably substantiate the fact that 1 GbE is enough or the oposite? I mean some measurment with short period, which would register very short peaks of traffic? Is for example Cacti good for this? – Andy Coarse Sep 12 '16 at 17:43
  • Yeah, use monitoring on the hypervisor. Watch a sustained typical period of production traffic, say a day or a week. The total throughput on the bond should tell you your current usage. Now look at how much you expect usage to grow over the life of the system, which should correspond with the expectations in the requirements document that management sent you when capacity-planning the system. They did send one, right? ;) – suprjami Jan 03 '17 at 11:54
2

Yes, I trust LACP. I prefer LACP over all other link aggregation methods because it's so reliable, flexible, and is an IEEE standard so vendor interop is guaranteed.

If you think your virtual machines will do more than 1 gigabit per second of traffic (and that's very easy to do) then you want to load balance. The only load balancing modes (on Linux) which work for you are either Mode 2 (balance-xor) or Mode 4 (LACP). Mode 2 uses the same balancing as Mode 4, just without the constant heartbeat to the switch.

suprjami
  • 3,476
  • 20
  • 29