1

I'm building a mesh network. All nodes have an "AP-level" router/AP [as we call it to distinguish it from the "MESH-level" router/AP] that takes care of handling client connections via a traditional wifi network and they all have the same config. They all create their own 172.16.xxx.yyy subnet and their DHCP assign addresses to clients spanning from 172.16.1.1 to 172.16.255.254 [that's acceptable right?]. openwrt takes care of randomizing the IPs based on the client's MAC address, therefore equally distributing client IPs across the subnet.

Now - on nodes physically nearby - if a wifi client roams [meaning it doesn't fully loose connection to one node before switching to the other one] to a new node, it does not request a new DHCP lease, but simply carries on with the old address it was using on the previous node, without checking whether the new node already has that IP assigned to another device. Centralized DHCP servers are not a option here. This is the question:

Since our DHCP pool includes more than a milion IPs and no more than 30 devices will ever be connected simultaneously to a single node, how safe is this setup, in terms of avoiding IP collisions?

This is my logic:

If I understand DHCP dynamics correctly, after, over the years, one node will see an IP collision [as the MAC addresses will convert to the same IP] [let's say another device gets my phone's IP while my phone is not connected to that node]

Case 1] my phone reconnects to the same node from scratch [not roaming from another AP] and gets another IP --> no problem

Case 2] my phone connects to another node on the network at the same time --> no problem

Case 3] my phone connects to another node on the network at the same time, gets its usual IP [since all nodes calculate IPs with the same algorithm, at least I assume that's the case] that happens to be nearby the first node [not that likely] and then roams to that first node --> conflict

Odds of case 3 happening: in order for a collision to occur and cause an actual conflict, both devices must be connected at the same time on the same node. Any other case creates no problem as time displacement or spacial [node] displacement takes care of that for us.

Let's say no more than 30 clients will be connected to the same node at the same time. Now, since all of their IPs will be different, as their IPs will not be random, but the DHCP server will have taken care not to have any two devices share the same IP, this is NOT a birthday paradox case, as those 30 IPs won't be unrelated events.

Therefore there will be 30 cases out of one milion where the roaming phone's IP will collide with an existing one, a 0.003% chance of happening, or a 1 in 33k+ roaming events.

Is this logic and this math correct or am I not considering some major factor at play here? 10.xxx.yyy.zzz subnets are not an option here as they are used by the meshing layer and we don't want clients sharing that subnet.

Thank you and sorry for the very specific question, but this is fundamentally important for us.

Nikksno
  • 11
  • 1
  • How many subnets do you have? It sounds like you only have one. – Ron Trunk Apr 15 '17 at 16:40
  • Thank you Ron. I'm not sure what you're asking. If you're asking how many 172.16 subnets we have, that's one for every node in the network [currently 40]. – Nikksno Apr 16 '17 at 13:59
  • Maybe I don't understand your question. If you have 40 subnets, each with its own DHCP server and scope, why would you have collisions? – Ron Trunk Apr 16 '17 at 14:34
  • Because each node assigns DHCP leases independently, and since client IP addresses are calculated based on their MAC address, and there are a ton of MACs that map to the same IP, if two devices on two nodes get the same IP [say, for instance, 172.24.123.456] and then one of the devices roams into the other node's wifi network, there will be a collision. The question is whether the probability of such event is as remote as I think it is, or if I'm not considering some other factor that would make it much more likely. – Nikksno Apr 16 '17 at 15:33
  • But you're saying two contradictory things. If you have two nodes on two separate subnets, then they will never collide. If you have one large subnet with multiple DHCP servers, then yes it's possible to have a collision. My question would be, why can't you break your network up into multiple subnets? – Ron Trunk Apr 16 '17 at 15:38
  • Actually we have a whole bunch of nodes, which will become very very many over time, all separate but with the same config, and there will be collisions because clients will be roaming between subnets without asking the DHCP server for a new lease nor verifying another device already has their same IP on the destination subnet. This is a mesh network, so every node is on their own basically. It's more of a math and logic question than one strictly about networking. Thank you. – Nikksno Apr 19 '17 at 13:29
  • Two subnets can't have the same IP address range. Maybe you mean a different term? – Ron Trunk Apr 19 '17 at 13:32
  • They are physically separated and only connected through a layer of mesh networking. All nodes are made of two routers: "mesh-level routers", that communicate with each other through a mesh layer over wifi on the 100.xxx.yyy.zzz subnet [I think that's the term], and create their own 10.yyy.zzz.0 LAN, then the "ap-level routers" have a cable going from that LAN to their WAN [getting the 10.yyy.zzz.2 IP as their WAN IP] and they all create their own "ap-level" "sub-LAN" which this time is always the same [172.16...], and is broadcasted over wifi with the same SSID all over the place – Nikksno Apr 20 '17 at 11:42
  • The setup does work, but we're trying to understand at a mathematical and probabilistic level how often collisions will occur. This is a naive setup, but it's 100% distributed, and cannot rely on network-wide DHCP lease tables like other meshing protocols do [i.e. libremesh]. All of the setup, really, is just to give you a background, but really the question is about DHCP and statistics. Thank you for your time. – Nikksno Apr 20 '17 at 11:44
  • Let me ask the question in a different way: When the APs hand out their addresses to clients, what subnet mask do they use? – Ron Trunk Apr 20 '17 at 11:55
  • They all have this config: 172.16.0.1 for themselves, 255.240.0.0 subnet mask, 172.31.255.255 broadcast, **DHCP from 172.16.1.1 to 172.31.255.254**, therefore about a million addresses. Sorry for not being at all an expert on protocol stuff ;] – Nikksno Apr 20 '17 at 12:19
  • OK. So you have only ONE subnet. Sorry for going off on a tangent, but I needed to understand what you are doing. – Ron Trunk Apr 20 '17 at 18:29
  • So here are the next set of questions (you may not have the answer, but they determine the probabilities). IS the DHCP algorithm deterministic, or is there some seed value (like time of day)? What does the DHCP server do when it has a collision? Most hosts do some sort of check when the get a lease -- they ping or ARP. What to your hosts do when the detect a collision? How many different host manufacturers are there? – Ron Trunk Apr 20 '17 at 19:01
  • Thank you so much Ron, I absolutely understand, sorry for not being an expert at all here on many things. The only variable in the DHCP is the client's MAC address. I get the same lease with the same client on every node, every single time, and the leased IPs seem to equally span the entire pool's range, so that's perfect. All of the nodes run the same version of openwrt and have the exact same config, so that helps. I haven't been able to test this as I have no idea how to artificially create a genuine collision, so I don't know how either the server or the client will behave. Any ideas? – Nikksno Apr 22 '17 at 11:39

0 Answers0