1

I have a fairly large dataset which I need to import into MySQL. I'd like to do so as efficiently as possible as I am going to need to do this repeatedly. Are there tricks for this? Is there some way which is more efficient than extended INSERT statements?

To address the questions in the comment, the original format is in XML, though I already have a transformer to put it into CSV format. I'm working with MySQL 5.1.x and the target database is InnoDB. There are relationships between the data, but I do not have any constraints and the data isn't clean enough in most cases to impose them.

Brad Gilbert
  • 2,473
  • 2
  • 21
  • 19
Tim
  • 1,879
  • 3
  • 18
  • 16
  • info about the data source/format, MySQL version, preferred destination engine, and intended use after load would help us give you an excellent answer. – Bruno Bronosky Jul 09 '09 at 15:07

6 Answers6

3

Try mysqlimport you can import directly from CSV files.

Sekenre
  • 2,913
  • 1
  • 18
  • 17
2

I would read the mysql manual, I was thinking about INSERT AFTER , but LOAD DATA INFILE looks interesting "20 times faster"....

James
  • 2,212
  • 1
  • 13
  • 19
1

There's an open-source ETL tool called Pentaho Data Integration that has a mysql bulk load option (under the experimental folder in Design view). You can also quickly load data in a platform agnostic manner using their CSV Input and Table Output steps. If you need to do any processing on the data before you load it you can encode that logic within this program.

0

You want to use the LOAD DATA statement. If you don't have sudo privs you also want to use the LOCAL option so you can load the data from wherever you put it. LOAD DATA sucks in data at prodigious speeds on my Dell T5600, about a gigabyte every 2-3 minutes. I just finished an 8gb load of some 37 million records in 1h13m.

Juan Jimenez
  • 717
  • 1
  • 6
  • 12
0

Use /*!40000 ALTER TABLE tbl_name DISABLE KEYS */; and /*!40000 ALTER TABLE tbl_name ENABLE KEYS */; around the INSERT statements for each table.

This will ensure that non-unique indexes are created after all rows are present, rather than attempting to build them piece-by-piece.

Dan Carley
  • 25,189
  • 5
  • 52
  • 70
0

Even a bulk "insert ..." shouldn't take too long. I took a new JIRA database export of 200MB and loaded into a virgin database in approximately 30 seconds.

Chris K
  • 659
  • 4
  • 11
  • I'm talking about tens of GB, with lots of indexes. So while not undoable for a single import, since we'll be doing it repeatedly, efficiency counts. – Tim Jul 09 '09 at 15:04