0

Possible Duplicate:
Can you help me with my capacity planning?

I have written an application that uses MySQL & Sphinx for searching and storing information.

the application contains a database that weights 300GB (mostly text, mediumtext and varchars). my sphinx index weights 30GB.

my problem stated when I was planning to have a 100GB~ DB and I now have a 300GB one, which can't be loaded into memory(?).

I'm looking for the best architecture that will give me the highest performance.

my application first gets a query, then runs it against the sphinx index, gets a bulk of IDs from sphinx, queries the DB for several text field for those IDs and sends the results to a .NET ranker (special algorithm).

In order to get the best performance I used a Xeon i7 W3520 server with 2 SSD disks (24GB ram) for MySQL, ain't cheap and I'm out of space and can't add more disks so I need another server.

Do you think there will be a major perfomance impact if I'll use a server with regular disks for mysql? I know that I need fast disks for the sphinx searching (this is my full text search engine), I'm using an indexed field (ID) to select rows from the DB without any complicated queries, I may need to host the DB on a cheaper server with enough storage.

can you spot my bottleneck? the .NET application needs mostly RAM and CPU, the DB needs a lot of space and or RAM and the sphinx needs fast disks and RAM.

  1. is it possible to load the MySQL index to the RAM?
  2. what can I do to mitigate the fact the DB's size if 300GB?
  3. will a server with slower disks impact the performance by much? (I can't benchmark it without spending hundreds of dollars on hardware).
  4. does it matter if I use a linux or windows machine for the MySQL?
YSY
  • 105
  • 4
  • I think we'd really need to know more about the actual nature of the processing and the bottlenecks. The overall amount of data doesn't seem so large, but it seems that you are performing quite complex processing with a lot of moving parts and a lot of data being pushed around to different components. e.g. if you need sub 100ms response time and you have 5 distinct non-trivial operations, you need to squeeze each one to 20ms on average, which might be too difficult for either your code or for the third-party components. – Cade Roux Jul 15 '12 at 15:24

2 Answers2

7

I'm looking for the best architecture that will give me the highest performance.

What is the question here? Do not get me wrong, but there is exactly ONE way to do that - put the database in the fastest SSD you can get.

May it make sense? No idea, but you do not ask for a fast enough scenario, but the HIGHEST performance.

and I now have a 300GB one, which can't be loaded into memory(?).

You know, this is not true. A 512gb server is not particularly large by todays standards.

will a server with slower disks impact the performance by much?

Sure it will. Normally IOPS - IO peformance - is THE limiting factor for a database when the database is not loaded in RAM.

Do you think there will be a major perfomance impact if I'll use a server with regular disks for mysql? I

Define "regular disc". An array of 15k RPM SAS disc is "regular" for a high performance database, every disc about 3-4 times as many IOPS than your regular disc, but then.... with a SMALL database like that put it into a 512gb SSD and get 50.000 IOPS instead of 500 or so a 15k SAS drive gives you.

the DB needs a lot of space and or RAM and the sphinx needs fast disks and RAM.

ANY database that does not fill out all things from RAM is limited by discs normally, mostly because discs are terrifically SLOW SLOW SLOW. Getting the fastest disc subsystem possible is standardin larger database installations, but you have a small database, so a simple SSD is good enough. It is STILL a lot less performance than RAM, sadly.

what can I do to mitigate the fact the DB's size if 300GB?

Realize this is tiny? I have a 850gb database at home, and my last projet was a 21000gb database. Depending what I do with some data I have access to now, I am just getting 44000gb highly compressed data that I Would need to process. 300gb is "get memory" small.

does it matter if I use a linux or windows machine for the MySQL?

Maybe it does, but it is totally irrelevant for THIS question. Windows may have - depending on configuration - a larger overhead, but it wont make a difference. When you deal with systems like this, then brutally spekaingan additional 128M overhead makes zero difference.

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • 1
    +1 for brutally honest truth-telling. – womble Jul 15 '12 at 07:20
  • Obviously for BEST performance SSD is the answer. But from a reliability or disaster/data recovery standpoint I'd be scared. I understand this doesn't mean anything if you have good backups as well as redundancy. I'm still not sure I've jumped on the bandwagon of using solid-state drives in servers with important data stored. Although, there's plenty out there to claim they are more reliable than ever and being used more often in servers. http://www.intel.com/content/www/us/en/it-management/intel-it/intel-it-validating-reliability-of-intel-solid-state-drives-brief.html – Tyler Jul 15 '12 at 09:32
  • Why scared? Use multiple SSD and replicate the content. Any decent RAID controller can do that. Plus - funny enough - SSD are more reliable and rarely fail with data loss, they mostly turn read only, while a hard disc totally failes. High end databases often use SSD - you can buy an ExaData with an SSD only configuraiton and even the hard disc based on uses a TON of flash to buffer writes. SSD have long made inroads in the last 2-3 years. Backups? Make backups every 15 minutes, plus live replication to a second server. – TomTom Jul 15 '12 at 12:19
1

do you actually know what your server is doing, do you know where are the performance bottlenecks? if you dont have any sort of monitoring and trend tracking you're blind.

maybe just adding proper indexing for the data can help you?

probably IO indeed is your limit, but maybe you have a lot of spare CPU cycles and just compressing the blobs kept in the database can increase memory-cache hit ratio at the expense of [de]compressing on the fly which probably will be unnoticeable.

i would argue that running mysql/sphinx under linux will give you better chance to see and track 'what's going on inside'.

pQd
  • 29,561
  • 5
  • 64
  • 106