Let's say we're starting a project which is meant to be served all around the world

How am I supposed to distribute the database and server load, and increase (optimize) the service for large audience?

I know about reverse proxy, load balance, using DNS for subdomains to target multiple IPs which can serve the same content.

I'm currently interested, if there is any easy way how to distribute my whole database and service (which is single VM for now) to distribute the load.

Some of ideas/questions I got

  • If I do need to start VPS in each region and then merge data in background
  • If I can do this with single server and only using CDN's to deliver static resources
  • If there is any platform, where the I can separate the M-V-C layers, so each will be running on separate server, or can be optimized, distributed separately
  • If I can host my application on some cloud service, which will handle the increasing load and distributing the service alone. IaaS provider needed ?

I don't think this is easy to be answered, so the best way to answer will be probably, to point me to some white-papers which are related to this topic.

Marek Sebera
  • 271
  • 3
  • 16

2 Answers2


This is a very broad question and there is no silver bullet to solve this problem. The biggest challenge in setting up multiple sites is the Database, especially multi-master databases. Mysql and a bunch of nosql databases do support multi-master replication, you would need to evaluate and figure out which one fits your requirement the best.

Slightly off topic but how much latency is acceptable in your setup? CDN, reverse proxies can help speed up your site. The likes of Google/Yahoo/Facebook serve dynamic content inter-continent without too much latency.

  • 4,070
  • 2
  • 16
  • 11
  • I am afraid, I need to lower the latency as much as possible. Isn't this database replication problem solvable by using IaaS/PaaS provider and having single application instance at all? Can you please elaborate about possibility of cloud service? Thanks – Marek Sebera Feb 06 '12 at 11:48
  • You're right, but I'm a bit concerned that you mention MySQL & NoSQL replication in the same flow of thoughts as geo-scaling. MySQL's replication is emphatically **not** robust when dealing with large write volumes and variant WAN links. –  Feb 06 '12 at 18:33

easy way how to distribute my whole database

Think about locking for a second. When 2 clients want to write to the same row in a database, the database uses write locks to avoid race conditions and invalid data. In a 'distributed database' scenario the acts of acquiring and releasing a lock themselves need to be distributed. How would you do that, how would you create a performant locking system when other nodes could be up to 0.300 seconds away?

There is no good answer to this, it's one of the hardest problems in computer science. For an introduction you might read up on the CAP theorem.

single server and only using CDN's to deliver static resources

Yes, that's the most common method. Keep all your dynamic data in a single datacenter (i.e. webapp servers and database servers colocated in the same facility), and then using a global CDN for the static bits. This setup is easy to reason about, and generally works well.

is any platform, where the I can separate the M-V-C layers, so each will be running on separate server, or can be optimized, distributed separately

In very close proximity to each other, where the network links are very fast and consistently low latency: No problem. In a geo-distributed fashion, where the network links are slow, it's not possible.

host my application on some cloud service, which will handle the increasing load and distributing the service

Google App Engine does this, to some extent. It's IMHO the main thing you benefit App Engine has. In order to achieve this you'd have to program against App Engine'ts very simplified data model (i.e. no SQL, only BigTable), which has significant negative tradeoffs.

Your question is all over the place, and the problem is complex -- so it's not easy to just point you at a single book to read.

  • I think a computer science course in distributed computing would be great.
  • I also remember Cal Henderson's book "Building Scalable Web Sites". It's completely different from the above, it's more a collection of somewhat dated strategies for scaling web apps. While it's getting old, I still think it serves as a good introduction to the common problems in scaling webapps, and the mindset for analyzing and fixing the issues.