0

I want to build an asterisk cluster from 3 asterisk (13.x) nodes, one on US, one in Europe and one is Asia. Right now I have the 3 servers. Asterisk is using realtime infrastructure for sip/iax users, queues, cdr, cel, queue_logs. All users are using SIP and softphones.

Databases are replicated with a Master <==> Master solution so basically I have the "same" database on all locations and the datas are replicated realtime (1-2 minutes as the servers are far far away one from another).

Due to the fact that the queues table is replicated I have all queues on all locations, and this is the desired behavior as one server is backup for another.

What I want to accomplish is that no matter where a call is routed, to one of these 3 server the "system" should be able to find the available agents from that queue and ring to their device, no matter where the agent is logged in.
In order to replicate device states I have used Openfire XMPP, but here I have some discrepancies as often an user has different states in a specific queue on different servers.

ie. I have agent Adrian realtime IN_USE on one server where he is in call and realtime NOT_IN_USE on other server. The problem here is that if a calls arives in that queue in the second server this will try to place call to Adrian, not knowing that he is already in call. Due to this it will stress adrian as it will ring on 2nd line of softphone and the call is not going to other available agent.
I suspect that the issues are generated by the fact that I have all queues on all servers so that states are somehow altered by that.
I saw that there are setups where there is a dedicated queue server. Why is that ? to avoid this kind of issues or for load distribution ?

What is the recommended approach for asterisk clustering with a shared database scenario ?

Any thoughts on how can I achieve that ?

P.S. I have enabled callcounters in sip.com


Updates:
Indeed the term cluster is not proper used here. I guess that the ideal scenario would be to have a cluster of two asterisk server on each location (active-pasive with failover detection) and after that layer to have a higher one that balance the call between these locations.

The main problem that I have now is that I have these 3 locations and I share queues between them (as they all has the same database). Let's say that queue named TestQueue has 15 users, 5 on every location (the team is split in 3). What I want to accomplish is that no matter on what server a call enters the queue to be able to reach all available agents (and determine which are busy and which not).
I`m not sure if my approach is OK, or I should have one asterisk server used to host the queue and other 3 servers where the users will register (with xmpp sync status between queue server and register servers).

Videanu Adrian
  • 173
  • 1
  • 9

1 Answers1

0

From your description you are confusing/mixing clustering and load balancing, which is going to create a mess...pick one or the other. Have a look at this serverfault question and answer which does a good job describing the state of clustering for Asterisk.

If this is truly a cluster then standby nodes should not have Asterisk running when not in use (no active SIP stack for agents/trunks to connect to). Trying to keep active and stanby nodes operational using different statuses etc. is closer to load balancing - which isn't really desirable in terms of what you are trying to achieve.

The master-master synchronization is also a mistake in clustering. If one node is failing or starts to corrupt the data it should not corrupt the other peer(s) - which master-master synchronization does. Similarly, using shared file stores like NFS, iSCSI, DRBD all allow one failing peer to corrupt all peers.

Instead of 'shared database' look for 'synchronized data'. That way the clustering software can control what is synchronized (and avoid synchronizing if a peer is failing).

You're also missing one major aspect of clustering - health detection. How will you know if hardware is failing? trunks are down? upstream devices are down? Agents can't connect? etc. How will that trigger a failover? To which node?

You seem to be taking a 'hardware layer' clustering perspective which doesn't work well at the application layer which must share state (eg: voicemails, queues, etc) across peers. Your approach works best with simple OS level services (HA file sharing, HA database, etc) - not application layer services. Have a good look at the ServerFault question mentioned above and this Voip-Info web page.

If you're having trouble picking a direction, consider this: load balancing is great for achieving high capacity on systems without state. 10 years ago this was important for Asterisk given how few simultaneous channels a single server could keep open. Now even commodity hardware can keep 500 channels open (without transcoding) so load balancing has fallen out of favour.

Clustering with managed state is now the standard for critical call centers. This includes sophisticated health monitoring, synchronization (not shared data), intelligent customization/differentiation between peers using a shared dial plan, etc. Clustering with 2 peers is also the standard (with peers in different data centers) - when you get to 3+ peers you start to run into new issues of synchronizing state in the case of contention (2/3/4/etc active). With Master-Master DB as you describe you will never recover from multi-active contention.

It sounds like you bought lots of hardware and are now designing a solution to use the hardware. You may want to approach this problem the other way - and repurpose unneeded hardware once you have a practical design. I would suggest a 2-node cluster, with data synchronization, and health monitoring.

TSG
  • 1,634
  • 6
  • 29
  • 51
  • first of all thank you for the detailed answer. I`ll update my post to answer to some on questions from here. – Videanu Adrian Dec 29 '16 at 06:54
  • Your update is a very different question. Asterisk cannot natively (low level) share a queue the way you want. You would have to build that one layer higher (in the dialplan or through AGI, etc). Simplest solution may be to create 3 HA clusters (one per location), and AGI app which moves callers between queues to balance load. – TSG Dec 29 '16 at 14:00
  • 3 clusters (one pe location) sounds great. Regarding the AGI script that would move the call between locations (I just want to me sure I have understood), this should be run before the call is placed in queue right? As if the call is already in queue I do not know how to transfer it from there automatically. At least not with AGI, I guess that AMI might help here... – Videanu Adrian Dec 31 '16 at 12:10
  • You can use 1 asterisk server to route to 3 other (pairs/servers), or you can use timeout in queue to move callers between queues (dynamically), or you can use AMI based app to manage queues and reprioritize/move, etc. Many ways to accomplish this - all above the built-in QUEUE functionality – TSG Dec 31 '16 at 16:05