0

I'm running Snort in conjunction with MySQL for logging, which is generating ENORMOUS datasets (currently the event table is over 2.5 million, I don't know exactly how much because it only goes up to 2.5 million before it clunks out from using too much memory).

Unfortunately, this data is not very useful anymore because I can't pull it out anywhere else (the stored procedure causes the server to crash).

My question is, is there a way to optimize MySQL for these huge datasets or is this beyond the technical capabilities of MySQL and I need to go to something like Oracle, MS SQL, or PostgreSQL?

We have both an Oracle and a MS SQL Server instance but both of these are business critical production servers and it would be very bad news to knock either one of those offline or inhibit their abilities.

Any thoughts on the matter?

tearman
  • 425
  • 1
  • 6
  • 20
  • What version of MySQL? Also, what spec is the server? I have had a shade over 6 million records in MySQL on a desktop class machine (and I will admit to tweaking *none* of the config) and the performance was alright, considering it was doing lots of other stuff too. – Ben Pilbrow Oct 04 '10 at 20:58
  • Version 5.1 and its on a VM on our ESX cluster. Its no slouch as far as storage (Enterprise iSCSI SAN) and processing power (multiple modern generation Xeons). – tearman Oct 04 '10 at 21:02
  • Also, what storage engine are you using? – Ben Pilbrow Oct 04 '10 at 21:02
  • InnoDB is the storage engine – tearman Oct 04 '10 at 21:06
  • innodb_buffer_pool, innodb_flush_method, and innodb_flush_log_at_trx_commit would be good places to start. – BMDan Oct 05 '10 at 02:10
  • Also, run "mysqlreport" and paste the results, please (run it after the server's been running for a while). – BMDan Oct 05 '10 at 02:18
  • That is ENOURMOUS? Seriously? I mean, I regualrly (using SQL Server though) run 500 million row result queries on a table with more than 4 billion rows. – TomTom Aug 05 '12 at 13:31

3 Answers3

3

like others say - 2.5M is not a huge number of rows. look at your schema design - can it be that your reporting runs full table scans where indexes can be used [warning: introducing new indexing will decrease insert performance].

did you try to optimize innodb? make sure that at least indexes fit in buffer pool memory. try mysqltuner.pl or if you have more time - dive into mysqlperformanceblog.com.

pQd
  • 29,561
  • 5
  • 64
  • 106
  • Unfortunately I didn't design the schema and changing it would require a rewrite of the software (barnyard2) so I don't have much control there. – tearman Oct 05 '10 at 03:01
  • @tearman you can still add indices without changing the software. it all depends on what sort of queries are run. maybe you can move older data to other table that is less used and just generate daily/weekly reports from it? – pQd Oct 05 '10 at 06:41
  • I've changed around how the indices works and added more resources available to the machine, and tweaked some of its parameters as requested. We'll see how it goes. – tearman Oct 06 '10 at 21:46
  • Thanks for the mysqltuner.pl suggestion. It helped me figure out which directives I needed to adjust. – Jason Swett May 05 '11 at 20:08
0

2.5 million records should be no problem. Sharing the schema would help. Also, mysqltuner.pl (mentioned in another answer) will warn you about some my.cnf issues - such as innodb_buffer_pool being smaller than the size of your indexes. Definitely run that. innodb_buffer_pool should be set as high as possible.

If you have any TEXT columns, any queries that involve scanning a lot of rows will perform much better if you move those columns out into a separate table. Even better, use InnoDB plugin, Percona Server, or MariaDB and enable compression for those new text column tables.

outcassed
  • 1,280
  • 9
  • 11
0

Maybe innodb is not the best choice for logs?

I've got a centralized syslog server and it's set up so that every month the data goes to a different/new table and there is a view with all these tables joined. The old logs are then compressed with myisampack so they take up a lot less space, are read quicker and become read-only. It works very fast.

Jure1873
  • 3,692
  • 1
  • 21
  • 28