0

simple question - what is the best way to improve a database's performance using hardware only?

in this situation a few very very large tables are being queried very regularly by 1-4 processes at a time.

we are executing thousands of queries, many of which take +10s to return, and all of which return only a small quantity of data. this suggests to me that HDD seek time is the bottleneck.

as part of this process, we also need to create summary tables from our raw data tables. one of these queries may take hours to run.

please assume that all software/database optimisation has been done already.

Assume this because we have spent some time doing code/db optimisation and are ready to spend some of the budget on hardware. i understand that more software/db optimisation is possible but that is not the current focus.

we do not run out of ram at the moment, but could possibly allocate more to the DB.

the current platform is windows, this may change depending on the hardware solution.

the database is postres 8.4.

thanks.

pstanton
  • 553
  • 3
  • 10
  • 22
  • 1
    What database software? What platform? How large is "very very large"? What optimizations have been done (I'm sure you haven't thought of all of them.) – Joe Dec 12 '09 at 05:03
  • 6
    "please assume that all software/database optimisation has been done already." what a silly assumption to make. –  Dec 12 '09 at 05:11
  • 2
    That's an odd question. When you did "all software/database optimisation", what did you optimize for? A database server is a system, so pick hardware to work well with the "software/database optimisation" you did. –  Dec 12 '09 at 05:22
  • 1
    If the database itself is assumed to be optimal, then the answer is always "more spindles" or "more ram" – Matt Simmons Dec 12 '09 at 13:33
  • 1
    Give us something to work with. What hardware are you currently using? Without that information answers can only be generalisations at best. – John Gardeniers Dec 13 '09 at 23:24
  • 4
    "we are executing thousands of queries, many of which take +10s to return, and all of which return only a small quantity of data" - to me that points to poor indexing, not hardware – Mark Henderson Dec 14 '09 at 23:25

5 Answers5

8

What's the performance bottleneck? The usual culprits are:

  • If the system is I/O bound, adding more CPUs won't help. Adding more memory might increase the portion of the database that can be cached in memory, but the best way to improve performance will be to increase the I/O bandwidth.
  • If the system is CPU bound, then adding more cores will help if the DBMS software can make good use of them. Otherwise, increasing the speed of the CPUs will help.
  • If the system is memory bound - the bottleneck is getting the data from RAM to the CPU - then improving the memory bandwidth will improve the performance.

Note that when you manage to relieve one performance bottleneck, one of the others becomes the new performance bottleneck.

On most systems running database with seriously large data volumes being scanned, the system is I/O bound. Adding more disk controllers and spreading the data across them increases the parallelism available for disk I/O, usually improving performance.

Don't forget that the most dramatic improvements may be algorithmic - if you can answer the question two ways, one scanning 1 TB of data and one scanning just 1 KB, the second is going to outperform the first, regardless of what you do with the hardware.


Elaborating on disk controllers - as requested.

A given disk controller can only transfer so much data from disk to memory in a second, typically using DMA. To increase the bandwidth, you can spread your data over multiple disks controlled by separate controllers. Then, if the system architecture allows it, you can get more data per second from disks into the main memory - thus speeding up operations. Note that if all the data in the database is on a single disk, then extra controllers do not help. If the DBMS cannot trigger parallel reading from the separate disks, then extra controllers do not help. So, whether the extra controllers will help depends on your hardware and your software and the organization of the data in the database.

Jonathan Leffler
  • 1,035
  • 11
  • 20
  • your comment re disk controllers sounds about right .. care to elaborate on that - i've edited the question to add more info. thx. – pstanton Dec 13 '09 at 21:19
  • 1
    Also keep in mind as you improve one area of performance in isolation it can place pressure on another area that wasn't previously a bottleneck. For example if your CPUs were a constraint and you added more cores then your I/O system might become a bottleneck as you are able to drive it harder. – Sim Dec 14 '09 at 22:58
3
  1. RAM
  2. Faster CPU
  3. Store the database on a RAID0 (or RAID10 if redundancy is needed) with as many drives as you can.
  4. Multicore processors.
  5. Increased sizes of L0 and L1 caches.
wallyk
  • 220
  • 2
  • 12
  • please elaborate on 5 – pstanton Dec 13 '09 at 21:22
  • 1
    RAID0 should *never* be used - if *any* drive fails, your data is toast – warren Sep 13 '11 at 17:44
  • @warren: Never say *never*. For some applications, the tradeoffs may be acceptable. – wallyk Sep 13 '11 at 19:41
  • @wallyk - I suppose there's some flaky, footnote-to-an-appendix-situation where it could be "ok" ... but since it's only striping with no data integrity, *ANY* disk issues will be catastrophic. Use RAID5 or 6 (or 15 or 16) – warren Sep 13 '11 at 22:10
  • 1
    It is quite common. For example when you run a shardf of 5+ replicated servers a raid 0 may be good enough. A server fails? good, rebuild it. naturally keep the master server not on raid 0. – TomTom Dec 02 '11 at 19:30
2

I would recommend collecting some basic performance metrics and finding the bottlenecks in your current system as a first step. That may give you a good idea where your money is best spend--I/O, memory, or processing power.

That said--from my experience, the most common bottleneck for really large tables is Disk I/O. If "all software/database optimisation has been done already" did not include utilizing proper partitioning strategies, do some research on that topic. If your disk subsystem is old, consider upgrading it.

Phil Sandler
  • 151
  • 7
  • yes, i believe the disk IO is the bottleneck. i've added more info to the question. the disk is not old but is a stock standard SATA in a desktop PC. – pstanton Dec 13 '09 at 21:20
  • THE DISc isa problem Database server run on many discs. 10, 20+. I have seen 200 disc systems. – TomTom Dec 02 '11 at 19:31
1

As jadew suggested, the more informaiton that is stored in the RAM the faster it is. Also, you may want to take a look at using SSDs, that would speed up things, however you do need to do a study on the lifecycle of those as a db storage medium. They go dead after so many writes.

monksy
  • 357
  • 1
  • 3
  • 14
0

more RAM should do it, of course you'll have to tell the db engine to use that additional RAM :)

  • 3
    Maybe. Maybe not. Depends on the OS and memory management limitations of said OS, and similarly of those capabilities of the SQL engine. – Joe Dec 12 '09 at 05:05