11

In this Computer World article, it specifies that PostgreSQL can scale up to a core limit of 64. Does this mean for one multi-core processor of 64 cores? Or multiple processors with fewer cores?

The reason why I ask is because I am trying to find how much processors PostgreSQL may scale up to but of course that may be limited to the type of processor. However, I've been finding other statistics in other databases (i.e. Microsoft SQL Server here stating it can scale up to 320 logical processors) and they don't specify their number of cores. Is this a very vague statistic?

Any thoughts would be much appreciated. Thanks!

O_O
  • 635
  • 3
  • 15
  • 25
  • 1
    PostgreSQL doesn't care if it's 8 8-core CPUs, 32 2-core CPUs, or whatever. It only cares about logical processors. Also, 64 cores is approximate and depends on the rest of your hardware; 64 cores won't do you any good if you have only 4GB of RAM for a 1TB database on a 7200rpm SATA hard drive. There's no hard technical limit on core numbers, it's just that it's recently been tested and proven to scale well up to 64. – Craig Ringer Dec 19 '12 at 01:56

4 Answers4

13

Postgres can scale up to as many processors as you want to install, and your OS can handle/manage effectively. You can install Postgres on a 128 core machine (or even a machine with 128 physical processors) and it will work fine. It may even work better than on a 64 core machine if the OS scheduler can handle that many cores.

Postgres has been shown to scale linearly up to 64 cores (with caveats: We're talking about read performance, in a specific configuration (disk, RAM, OS, etc.) -- Robert Haas has a blog article with a nice graph which I've reproduced below:

enter image description here

What's important about this graph?

The relationship is linear (or nearly so) as long as the Number of Clients is less than or equal to the Number of Cores, and then begins what looks to be roughly a log-linear decrease in performance as you have more client connections than you do cores to run Postgres backends on because the backends start fighting for the CPU (load average goes above 1.0, etc...).

While it has only been demonstrated for up to 64 cores, you can generalize that you can keep adding cores (and clients) and keep improving performance, up to the limit of some other subsystem (disk, memory, network) where processes are no longer having CPU contention problems but are instead waiting on something else.

(Haas also has another article where they proved linear scalability to 32 cores which has some great reference material on scalability in general -- highly recommended background reading!)

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 2
    Incidentally, the reason for this linear scalability was mentioned in [Oli's answer](http://serverfault.com/a/459116/32986): Postgres uses a separate backend process for each client connection. As a result if you're only using one connection you will not see much (if any) benefit for multiple cores -- you need parallel requests in order to exploit multiple cores. – voretaq7 Dec 18 '12 at 19:48
8

No it's a very precise statistic. A "logical processor" is a core. And a core is just that, it doesn't matter how they're spread over physical processors.

And if you're dealing with a machine with more cores than the supported number, this shouldn't be an issue with PostgreSQL. Each connection is inherently single-threaded* so whatever number of cores you have is what's going to limit the efficiency and efficacy of concurrent connections.

Needless to say this also means you should put your money in faster cores than quantity of cores unless you want to cluster things in a more complicated method.

* 2017 Update: Some queries (or subqueries) may be executed in parallel.

Oli
  • 1,791
  • 17
  • 27
  • 2
    `Needless to say this also means you should put your money in faster cores than quantity of cores unless you want to cluster things in a more complicated method.` <- I only agree wit this statement if the number of cores is greater than the number of concurrent clients, and the number of concurrent clients is unlikely to increase. It's pretty important for performance to have a core available for each Postgres backend... – voretaq7 Dec 18 '12 at 20:16
  • 1
    @voretaq7 I mostly agree but a CPU with a higher TPS can (obviously) handle more transactions in a given time, therefore more clients. There's going to be a sweet spot that depends on your load type and budget. – Oli Dec 18 '12 at 20:30
  • 1
    a logical process is the smallest logical execution unit, with the current technologies, it's not a core, it's a thread. – dyasny Dec 18 '12 at 20:41
  • @dyasny My working knowledge of SMT is a bit dated but I thought serious server industry didn't use it, in favour of pure SMP, for power efficiency and performance reasons. Again, it's a workload thing (or at least it was) but *you are right*. – Oli Dec 18 '12 at 22:02
  • 3
    @voretaq7 : It's not uncommon to connect to postgresql through some connection pooling mechanism. Amongst others this is done because connecting to postgresql is relatively expensive. Pooling can reduce the number of concurrent connections to the database extremely. So i tend to prefer fast CPUs over # of cores. But as always: it depends on many factors ... – m.sr Dec 18 '12 at 22:13
  • 2
    @m.sr Agreed - connection pooling mechanisms are very common. The "smartest" of these will spin up several connections to Postgres and balance among them (one of our in-house apps does it by giving each Apache process its own connection to Postgres - a pretty convenient mapping for our use case with a reasonable backend-to-users ratio). IMHO if your connection pooling is making queries queue up rather than spawning backends it's not doing you any favors but the pros and cons of that would be more interesting to delve into on [dba.SE]. [So I asked!](http://dba.stackexchange.com/questions/30711) – voretaq7 Dec 19 '12 at 03:21
2

Others have clarified that a logical processor generally refers to a CPU core, but I do want to comment on the statement that it doesn't matter how cores are spread over CPUs.

You can have caches on the CPU die that are shared among cores or that are dedicated to single or subgroups of cores. For example, one common config is dedicated L1 cache and shared L2 cache. In this instance, the scalability of a single dual core CPU can differ from two single core CPUs.

These scalability affects continue into main memory, with NUMA machines exhibiting different behavior than non-NUMA.

I point these out only because the OP is discussing questions of scalability, whose answers are generally more nuanced than "program X can use Y CPU cores".

Tim B
  • 186
  • 6
1

In this case, they mean multiple processors with fewer cores... Some of the talk is future-proofing. Some is marketing-speak.

ewwhite
  • 194,921
  • 91
  • 434
  • 799