We maintain a search service that serves data from MongoDB. Our Mongo production instance is arranged in a 4 node replica set across four physical servers.
The database is comprised of several small collections and one large collection. The large collection has the following characteristics:
- number of documents: 35 million
- average document size: ~4.2 kB
- collection size: 151 GB
- storageSize: 157 GB
Over the next year we anticipate that the number of documents in this collection will double to ~70 million and a doubling in the size of the collection.
I am conscious that the "Sharding Existing Collection Data Size" section of the Mongo Reference Limits document, it's specified that "For existing collections that hold documents, MongoDB supports enabling sharding on any collections that contains less than 256 gigabytes of data. MongoDB may be able to shard collections with as many as 400 gigabytes depending on the distribution of document sizes". Consequently, we would like to shard well before we reach the 256 gigabytes of data.
We are have some constraints on resourcing and we are not (yet) in a position to virtualise. However, we are in a position where I can purchase two new servers, bringing the total to six production machines.
My question is, is it possible to split Mongo into two shards where each one is a 3-server replica set with only six physical servers? I am conscious that in addition to the replica sets we require three config
servers and a mongos
server?
Should we even be sharding? Our current RAM usage and the number of connections are currently well within acceptable levels. Is there other strategies we might adopt to enable our database to grow that doesn't involve sharding?