0

I'm starting up my first online service. I will run on a MEAN stack running on Linux on AWS.

It is time now to guarantee my service will be running smoothly on a 24x7 basis. For Mongo, I'go to mlab for database replication. In AWS, i will put 2 servers on different locations on a failover mode. For resources monitoring I will go to Site24x7 or similar service.

All fine, but my worries is related to intervention in case of failure. As the company is a startup and there are only the partners involved, we will be mainly involved on travelling doing sales, marketing. Newer features will be done on "extra time" or through oursourcing. We also don't have much money to spend for now.

So, my questions are:

a) Does the described architecture is enough to be safe about the service stability or should I think of something else ?

b) Should I go for failover between AWS and other providers? If so, is there a tool to do it automatically?

c) Should I need a person to handle unexpected events on a daily basis? For now we don't have money to it, so what are the alternatives to having this person ?

I'm not worried now about capacity planning, but how to keep my environment running.

My expected SLA is of some minutes downtime, not more than half an hour.

Mendes
  • 121
  • 7
  • "Safe enough" depends on your SLAs and how critical your app is. If it's being used to treat ebola patients, "safe" probably means something different than if you're building a meme generator. – ceejayoz Nov 14 '18 at 14:34
  • 1) You need to define at least your RPO (you said 30 mins) and RTO, and the failure modes you're trying to protect against. Multiple AZs is enough to protect against many AWS failures, but you still need backups. Multi-region gives even more protection. b) Multi-cloud goes even further, but is manual and you need to load balancer across providers - CloudFlare / Route53 can help with that. c) A properly set up AWS system rarely needs someone watching it, but you need appropriate alerts and someone who can react to them. – Tim Nov 15 '18 at 16:26
  • Based on what you've asked I suspect you need someone with more AWS knowledge and experience to help you set this up. Have you considered security? Firewalls, Guard Duty, AWS Config, AWS Inspector, patching (SSM can do it). You probably want the number of servers sized so you can take one down to patch and the others can carry the load, unless you plan to do green / blue deployments. – Tim Nov 15 '18 at 16:27
  • If you want more help, or the question re-opened, I suggest you edit your question to give more detail on what the service is, how critical it is, how important data loss is, the cost of losing data / downtime, your budget to set this up, and anything else you think is relevant. – Tim Nov 15 '18 at 16:34
  • Thanks @Tim for the inputs. My key point is what you said: "A properly set up AWS system rarely needs someone watching it, but you need appropriate alerts and someone who can react to them". Even rarely needed, what if we cannot react is we are out for a business meeting? Is there a sort of System Administration "as a service", some service that monitors all day and in case of indisposal start a predefined script (like setting up a secondary server) ? – Mendes Nov 15 '18 at 16:50
  • I really think if you're going to use AWS for anything important you should do at least the AWS Architect training and optionally certification. In general, within AWS you automate response to incidents using things like auto scaling / CloudWatch. There is no person in AWS who will watch your instances and services for you and fix them if they break. – Tim Nov 15 '18 at 21:07

0 Answers0