2

I'm fairly new to mongodb, and am tackling some devops issues here.

We have a b2b saas product deployed on AWS where there is no network effect between customers, and some customers have much, much larger databases than others. They are currently being run off one central mongodb server and we're getting bad noisy-neighbour problems, we need to isolate the clients who have ridiculously large collections of contacts.

My question is: what is the minimum reasonable mongodb server for a setup where we give individual larger customers their own isolated vpc in aws? Does this need to be a 3 server replica set as indicated in the mongo docs or can one reasonably use one ec2 instance as a production mongo server for a small database?

Iain Duncan
  • 123
  • 5
  • Partly answered by [this question](https://serverfault.com/questions/384686/can-you-help-me-with-my-capacity-planning) – Tim Oct 05 '17 at 18:35
  • You're going to have to do some benchmarking or load testing to work this out yourself. You may or may not need a separate VPC for each customer, that's your call - the network traffic won't matter, it's more about isolation. If you don't need isolation just set up the network how you need it. Do you need a replica? That depends, replica probably gives you availability and scalability, what's your RPO and RTO? – Tim Oct 05 '17 at 19:24

2 Answers2

4

You can have a production environment running on a single MongoDB instance. Under very special circumstances. Let us see what those circumstances are after we cleared up some things about replica sets.

On replica sets

Contrary to popular belief, replica sets have only one main purpose: ensuring the availability of the databases. Let us assume you have only a single instance. Each and every maintenance work, every server crash, every mistake in administration will cause a downtime. Downtime which affects not only your SLAs (bad enough), but may very well cause a DBA to get rung out of the bed at 06:00 am in the morning after a party night, and now he tries to get his caffeine level high enough to be able to restore the database(s) halfway drunk.

Furthermore: inevitably, you will loose all the data between your backup and the restoration of the service. First and obviously, you will loose all the data between the last backup and the point at which the server became unavailable. Then, you will loose all the data which would have been generated during the downtime.

Now let us assume you have a replica set with two data bearing nodes and and arbiter. Slightly better. Your primary fails, the other data bearing node gets elected and thanks to automatic failover – which most drivers provide – your service continues to run, without downtime and data loss. But: you have lost redundancy. So in order to reduce the risk, one DBA gets rung out of bed again, who now has to promote the arbiter to a data bearing node, wait for the data to be synced while hoping that the sync is faster that the change rate of your data (to be more precise: you hope that your replication oplog window is bigger than the time needed for syncing the data). If not, the data sync will fail and you have to shut down the application in order to let the sync succeed. What you have won with this setup is that you can choose when to shut down the application to restore redundancy.

Side note: If your data change rate exceeds the replication oplog window, you should shard. Always see to it that your replication oplog window is big enough.

Now let us assume you have three data bearing nodes, as suggested. Even when one server fails (or is updated etc), you still have redundancy. One node fails during night? Sleep tight!

So when can I have a production environment with a single server?

Taking the above into account, you can have a single instance production environment if, and only if you

  • Have no SLAs and/or your customers can live with extended downtimes
  • Your DBAs are up to being on call-standby to restore the service
  • You and/or your customers can live with loosing data between the last backup and the restoration of the service.

In most serious business applications I know, the answer to one or more of those conditions is "No".

Can I have a production environment with two data bearing nodes and an arbiter?

Yes, under the assumption that you can live with the fact the losing one data bearing node will cost you redundancy. And that in this case you most likely have to resync your data, which requires you to closely monitor your replication oplog window and to make sure the time needed to resync fits it.

Given the price difference between an arbiter instance and a data bearing node, it is a question of risk management wether you choose between a setup with two data bearing nodes and an arbiter or the suggested setup with three data bearing nodes.

Conclusion

Can one reasonably use one ec2 instance as a production mongo server for a small database?

To put it bluntly: Not imho. It would be bordering negligence, increases risks which can be mitigated for comparatively little money and is most likely way more expensive than having at least a replica set with two data bearing nodes and an arbiter. If you take everything into account.

Does this need to be a 3 server replica set as indicated in the mongo docs?

Unless you really, really, really know what you are doing: Yes.

-1

You could go with a single EC2 instance as a production mongoDB server if you're happy for all the customer's data to disappear without warning. Of course, having chosen MongoDB in the first place (presumably for its well-known webscale properties), you're presumably not that interested in durability anyway, but running it all on a single EC2 instance is just begging the database gremlins to give you a bad day.

womble
  • 95,029
  • 29
  • 173
  • 228
  • 1
    Durability can be acceptable with snapshots, which are stored in the very reliable AWS S3. However, if your RTO and RPO are measured in minutes, snapshots aren't a great solution. Replication would be a better solution in that case. – Tim Oct 06 '17 at 01:49
  • Thanks Tim. We do not promise better than a day RTO, so I will look into snapshots to S3. I'm dealing with an inherited situation that needs improving in many areas, so that sounds like a palatable compromise to get started with. – Iain Duncan Oct 06 '17 at 14:33
  • RTO of a day? Daaaaaamn I want your job. What's your RPO, though? And budget? Streaming snapshots to get RPO to something palatable is *expensive*. – womble Oct 06 '17 at 21:59