sometimes, I have to re-import data for a project, thus reading about 3.6 million rows into a MySQL table (currently InnoDB, but I am actually not really limited to this engine). "Load data infile..." has proved to be the fastest solution, however it has a tradeoff:
- when importing without keys, the import itself takes about 45 seconds, but the key creation takes ages (already running for 20 minutes...).
- doing import with keys on the table makes the import much slower
There are keys over 3 fields of the table, referencing numeric fields.
Is there any way to accelerate this?
Another issue is: when I terminate the process which has started a slow query, it continues running on the database. Is there any way to terminate the query without restarting mysqld?
Thanks a lot
DBa
if you're using innodb and bulk loading here are a few tips:
sort your csv file into the primary key order of the target table : remember innodb uses
clustered primary keys so it will load faster if it's sorted !
typical load data infile i use:
truncate <table>;
set autocommit = 0;
load data infile <path> into table <table>...
commit;
other optimisations you can use to boost load times:
set unique_checks = 0;
set foreign_key_checks = 0;
set sql_log_bin=0;
split the csv file into smaller chunks
typical import stats i have observed during bulk loads:
3.5 - 6.5 million rows imported per min
210 - 400 million rows per hour
This blog post is almost 3 years old, but it's still relevant and has some good suggestions for optimizing the performance of "LOAD DATA INFILE":
http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/
InnoDB is a pretty good engine. However, it highly relies on being 'tuned'. One thing is that if your inserts are not in the order of increasing primary keys, innoDB can take a bit longer than MyISAM. This can easily be overcome by setting a higher innodb_buffer_pool_size. My suggestion is to set it at 60-70% of your total RAM on a dedicated MySQL machine.