0

Our network setup consists of 5 network access servers in 5 different locations worldwide and it is expected to expand in the coming days to 15 network access servers and more in future. Currently we use scripts for authentication but we are planning to use freeradius based AAA for authentication and accounting with these NAS servers due to many benefits that we can gain from utilizing accounting data. The user load is expected to grow to hundreds of thousands of users in coming days. My question to experts having practical experience of this kind of architecture is from scalability point of view. What is the best freeradius topology to be used in such a setup?

Would a centralized radius based AAA service consisting of multiple radius nodes be better than a distributed radius AAA service.i.e. one radius per NAS and why? We want to utilize accounting data during authorization so a distributed radius service will require synchronization of accounting data as well as user authentication data in almost real time. But with 10s of different locations, synchronization of data in real time seems to be difficult to achieve. I read about radius proxy servers that forward radius queries to a central radius server, however, I fail to understand how would it be more beneficial over directly using a centralized radius service directly from NAS. i.e. all NASs point to same radius service.

If a distributed radius service is considered, radrelays may be a way to go, but rad relays seem to be useful for primary to standby kind of setup where number of radius nodes is mostly two and I am not sure if they will be good to use if the they have to synchronize data between so many different radius servers.

I will be much thankful if some one can point me to the right direction.

4_dev
  • 49
  • 1
  • 1
  • 6

1 Answers1

0

If your focus is on reliability

The advantage of a distributed architecture with a locally replicated copy of the data, is redundancy, and reduced latency.

Synchronisation is not difficult to achieve, OpenLDAP's syncrepl protocol does a good job of hub and spoke, or even mesh topologies. It will perform partial and full resyncs of data as required. New instances should synchronize to the master as they start for the first time.

You will have to manage each of those instances though (use ansible or salt), and correct faults should issues arise.

There's increased hardware cost of having to place a server next to each NAS in a 'shared-fate' (if possible) sort of configuration.

You've not really provided enough info about the NAS to say if that would actually be appropriate. Can clients fail between NAS?

If your focus is on ease of management

The advantage of having a single (cluster) of RADIUS servers, behind redundant load balancers (hint hint), is simplified management.

A pair of servers would likely to be sufficient to handle the load of up to a million users. Each FreeRADIUS instance should be able to handle around 20,000-30,000 auth/s on moderate hardware against an OpenLDAP instance running MDB.

Upgrading, monitoring, fixing issues with the database are simpler to do with fewer instances.

The servers in this configuration represent a single point of failure.

If a NAS starts misbehaving and floods the authentication servers with traffic, there's a greater chance of the system being overwhelmed.

If there are disruption to the network links between the NAS and the central servers the NAS will be unable to authenticate users.

Proxy Servers

They're sometimes useful as aggregators, or in federations, but on their own don't really do much in a pass-through configuration.

Caching proxy servers can be useful as they take some of the load off the authentication servers.

In an ISP environment a large portion of the traffic is made up of rejects, as clients will keep re-authenticating.

Caching proxy servers can respond on behalf of the central servers if they've previously seen a reject, or if the central server is offline and they've previously seen an accept.

Arran Cudbard-Bell
  • 1,514
  • 1
  • 9
  • 18
  • Thanks for your very useful reply. Our preference will be reliability and low latency and yes clients can fail between NAS in case nearest NAS is not available, they will failover to any other NAS by manually chosing a suitable one. The challenge with openLDAPs syncrepl as you mentioned may be how to sync mysql based accounting data among so many different locations since we plan to use this data during authentication for authorization. Is there a cleaner way to do this? – 4_dev Mar 22 '16 at 08:04
  • In the event of an accounting store failure, FreeRADIUS can write accounting data to disk, and relay it when the accounting store is fixed. So maybe a hybrid approach with the accounting data being relayed and stored in a central server, and the auth/autz data being replicated? – Arran Cudbard-Bell Mar 22 '16 at 12:02
  • Thanks for the useful discussion. I was also wondering how would a centralized db (with master to master db synchronization) scale with multiple remote freeradius - one per each NAS? I will be thankful if you could comment on this as well? The idea is to use TCP's reliability over the network and keep udp on the same lan? offcourse by taking care of accounting store failures as well. – 4_dev Mar 25 '16 at 07:27
  • Not well, and Galera in particular is very susceptible to split brains and other odd synchronisation issues. LDAP replication works wel because it's low volume (comparatively), and fairly simple. – Arran Cudbard-Bell Mar 25 '16 at 13:55