In our setup, we currently have 3 shard set sharded cluster, each shard being a replica set of 3. Our writes are about to go up significantly to implement a new feature, and we know the extra data will be necessary. The nature of our writes are basically all upserts(which will likely be updates) and updates where we increment a particular field by 1.
Our updates are always being incremented by 1 and the way our data is distributed, not all documents are treated equally, some get their fields incremented a lot more. An alternative solution that I thought could be effective is to have some type of middle man, like a few Redis databases (or some smaller mongods) where we do the updates to them first and after about 5 minutes (or use some queueing system), we have a bunch of workers consume the data and update the actual live cluster with the documents. This would save our main cluster a ton of writes as it would allow certain update heavy documents to accumulate their updates and could save us a ton of writes (exact numbers I will post shortly in an edit).
So bottom line, when is adding another shard not the right solution?