I am trying to create high availability application. My current design has two VMs, both have public IPs, both are running in same subnet and both VMs have same web application running in docker. ssl certs and traffic to the app in docker is managed by Traefik. The first VM is master so its ip is updated to Cloudflare. There is a third VM running which has a script which hits the application over IP of first VM to check if it receives response or not. If script does not receives the response from first VM then it send a email notification to notify me of problem and then this script updates the Cloudflare with public ip of second(failover) VM so that traffic goes to second VM.
This design is working all good but it is very rudimentary. I know this can be improved but I am not sure how to make it better so need your suggestions. What I want to do is to run a health check of app on master VM and if it app is not responding for any reason then route the traffic to failover VM. During my research I came across keepalived, I have not looked into it but I think this could be of some help.