12

I have been wondering for a long time if there are systems that have to "scale up" (onto a more powerful, more expensive server) rather than "scale out" by being split across many smaller servers.

Do such systems exist, and if so, is there anything in particular that tends to cause systems to need to be scaled up, rather than scaled out? (For example, maybe ACID-complaint database transactions, or other strong data integrity requirements create this need.)

Since scaling up seems like it would bring much higher hardware costs than scaling out, it seems like something you would want to avoid, if possible, but I'm not sure if it is always avoidable or not.

So, are there systems that can't be scaled out, and have to be scaled up instead? What can cause this, and how would you identify such a system? (Do they generally share some characteristics in common that might make them more easily identifiable?)

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Demi
  • 229
  • 1
  • 8
  • One example: http://signalvnoise.com/posts/3090-basecamp-nexts-caching-hardware – ceejayoz Jun 26 '14 at 02:47
  • 7
    Scaling up is often far easier if your software hasn't been designed to scale out. Re-designing software is either expensive, or impossible if you don't own the source, or have influence over the developers. – Zoredache Jun 26 '14 at 02:57
  • Writing such systems is a very hard problem. Especially master/master designs that can come and go where multiple masters write to the same records at the same time. Who's write wins? – hookenz Jul 21 '14 at 04:24
  • 3
    You might be interested [CAP Theorem](http://en.m.wikipedia.org/wiki/CAP_theorem). Basically, a service which requires both consistency and availability as defined in the theorem will not be to tolerate partitioning. Most real world requirements can sacrifice some consistency for eventual consistency (manually or automatically handle inconsistency after the fact) or sacrifice availability by refusing to process a request when some of the participants are not available. Therefore, systems which requires both absolute consistency and absolute availability are essentially forced to scale up. – Lie Ryan Jul 22 '14 at 09:13
  • @LieRyan I know about the CAP theorem, but it does not affect the case of multiple servers connected by a reliable LAN. – Demi Jul 22 '14 at 15:50
  • 1
    If by "reliable LAN" you mean "never has a failure", then you're not designing for the real world. – mfinni Jul 28 '14 at 21:01
  • @mfinni What I mean is “we can clear faults within the acceptable downtime of the system”. Basically, be redundant at the hardware level. – Demi Sep 11 '19 at 17:33

3 Answers3

18

I primarily work with an application that has zero horizontal scaling potential. Even though it runs on Linux, the application, data structures and I/O requirements force me to "scale up" onto progressively larger systems in order to accommodate increased user workloads.

Many legacy line-of-business and transactional applications have these types of constraints. It's one reason I stress that the industry focus on cloud solutions and DevOps-driven web-scale architectures ignores a good percentage of the computing world.

Unfortunately, the scale-up systems I describe are really unsexy, so the industry tends to ignore their value or deemphasize the skills needed to address large, critical systems (e.g. cattle versus pets).

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Could the application be scaled horizontally if it had been designed for that from the beginning, or do the requirements imposed on it make horizontal scaling impossible? – Demi Jun 26 '14 at 02:39
  • It would require an entire rewrite and a shift to a new storage architecture. It's a tremendous undertaking of unknown benefit. I've been able to do the R&D work to help tune the systems as much as possible, but many companies find themselves in the position of supporting something that is less-then-optimal. Right now, the easiest thing to do is throw hardware at the problem. – ewwhite Jun 26 '14 at 02:44
  • 1
    "the easiest thing to do is throw hardware at the problem." Please, Moore's Law, don't stop working! – cjc Jun 26 '14 at 02:45
  • @ewwhite are there any new applications with the same problems, or is this problem confined to legacy systems (i.e. those not designed to scale out)? – Demi Jun 26 '14 at 03:14
  • 2
    @Demetri - Microsoft SQL Server is the most "high profile" product I can think of that is a typical "scale up" rather than "scale out". Scaling it out is nigh on impossible unless you meet a very specific set of criteria for merge replication. – Mark Henderson Jun 26 '14 at 03:38
  • 3
    Or if you can decompose the solution into multiple problems. For example, don't run reporting against your transaction database; hit a replica that runs on other hardware.\ – mfinni Jul 09 '14 at 01:59
  • 1
    -1. I think you missed hit the heart of the issue. Your problem isn't forced be scaled out if you can rewrite the system for a scale out system. This question about systems whose problem domain is such that a scale out is not possible at all even when designed from ground up. – Lie Ryan Jul 22 '14 at 00:48
  • 1
    @LieRyan Comprehension. I'm stating that the application I support *cannot* be scaled-out (it's a database-like system)... even if redesigned, due to architectural constraints. – ewwhite Jul 22 '14 at 01:05
  • Most applications that uses database can described as database-like system. There are many successful distributed database systems. – Lie Ryan Jul 22 '14 at 04:13
  • @LieRyan I'm describing a specific application and solution that *I* deal with and know intimately. Feel free to provide your own answer to this question if you have something to contribute. – ewwhite Jul 22 '14 at 04:15
  • @LieRyan This depends *heavily* on the application (I don't just mean how the software was implemented, but also what it's meant to do). There are indeed distributed database systems, however they tend to have to make compromises in terms of ACID compliance. They may perform better for certain types of queries but not as well for others. – Bruno Jul 22 '14 at 13:36
  • @ewwhite Would you be able to provide more details on how the architecture makes scaling out impossible? – Demi Jul 28 '14 at 21:13
  • 1
    @Demetri It's a flat-file database written in an [archaic language](http://www.sysmaker.com/infopro/x3j15/) that wasn't designed for scale-out, concurrency or modern hardware. File locking and data consistency become the limiting factors in scale-out. – ewwhite Jul 28 '14 at 21:17
  • @Bruno that may have been true in 2014 (not sure), but CockroachDB, TiDB, FoundationDB (all free software) and Google Spanner are all horizontally-scalable and strongly consistent. – Demi Sep 11 '19 at 17:36
8

From a developer perspective I can say that nearly every traditional mainstream database engine out there can only scale up and scaling out is very much an after thought.

In recent years with the need for greater scalability and highly available systems there have been efforts to make existing databases scale out. But because the designs are hindered by legacy code, it's very much just bolted on rather than fundamental to the design. You'll encounter this if try to scale most of the well known database engines. Adding slave servers can be quite difficult to set up and you'll notice that it comes with significant limitations, some of which may require re-jigging your database tables.

For example, most of them are master/(multi-)slave rather than multi-master designs. In other words, you might just have an entire server just sitting there and not able to process queries. Some do, but with limitations... e.g. read only multi-slave design. So you might have one server that takes writes and all the others provide read-only data. You'll notice when you set these systems up it's not always a straight forward process and difficult to get working well. It feels very much a bolt on addition in many cases.

On the other hand, there are some newer database engines being developed with concurrency and multi-master design from the beginning. NOSQL and NewSQL are the new design class.

So it would seem the best way to get better performance from a traditional SQL server is scale up! While with NOSQL & NewSQL it's both scale up & scale out.

The reason traditional RDBMS systems are tightly coupled is because they all need a consistent view of the same data. When you have multiple servers accepting updates to the same data from different clients, which one do you trust? Any method that attempts to ensure that the data is consistent through some sort of locking mechanism requires cooperation from other servers that either hurts performance or affects data quality in that any data read from a client might be out of date. And the servers need to decide among themselves which data is most recent when writing to the same record. As you can see it's a complex problem made more complex by the fact that the workload is spread across servers and not just among processes or threads where access to the data is still quite fast.

hookenz
  • 14,132
  • 22
  • 86
  • 142
  • Hadn't Oracle RAC been providing scale-out since 10g? – Dani_l Jul 21 '14 at 06:39
  • It has. But then having a RAC and having a flawlessly working RAC are two different things - this really requires special care to keep running. It is a nice design though. If you need it you likely are willing to pay the price. – TomTom Jul 21 '14 at 10:43
  • And note the shared storage system needed for Oracle RAC. That could present a scaling problem depending on how it's implemented. – hookenz Jul 22 '14 at 04:00
7

In my opinion the scale up/out demarcation is determined on how parallel a workflow can be, and how tightly the parallel threads need to coordinate with each other.

Single Threaded
For whatever reason, this workflow can only work in a single thread.

One thread means one system means scale up to make it go faster.

Tightly coupled parallelism
This is a multi-threaded system where the threads need to be tightly coupled with each other. Perhaps inter-process-communication needs to be very fast, or it all needs to be managed through a single memory manager. Most RDBMS systems are this kind of parallel computing.

For the most part, these systems are ones that scale up better than out though there are exceptions. For instance, workflows that would work on a Single System Image style cluster, single memory space but high latency between threads, may make scaling out easier. But such SSI systems are very tricky to work with so most engineers just make a bigger box.

Loosely coupled parallelism
This is a multi-threaded/process system where the threads are OK with high latencies between each other. Or don't need to talk to each other at all. Scaled out web-serving and render-farms are classic examples of this kind of system. Systems like these are a lot easier to make bigger than tightly-coupled parallelism, which is why there is a lot of excitement about this style of system.

This is the style where scale out is a lot easier.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • The reason RDBMS are tightly coupled is because they are tightly coupled to the data. i.e. multiple servers accessing the same resource. – hookenz Jul 28 '14 at 20:21