Let me set up the situation. We are trying to insert a modestly high number of rows (roughly 10-20M a day) into a MyISAM table that is modestly wide:
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| blah1 | varchar(255) | NO | PRI | | |
| blah2 | varchar(255) | NO | PRI | | |
| blah3 | varchar(5) | NO | PRI | | |
| blah4 | varchar(5) | NO | PRI | | |
| blah5 | varchar(2) | NO | PRI | | |
| blah6 | varchar(2) | NO | PRI | | |
| blah7 | date | NO | PRI | | |
| blah8 | smallint(6) | NO | PRI | | |
| blah9 | varchar(255) | NO | PRI | | |
| blah10 | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
The only index besides that whopping primary key is on the blah7, the date field. We are using LOAD DATA INFILE and seeing what strikes me as pretty awful performance, around 2 hours to load the data. I was led to believe that LOAD DATA INFILE was orders of magnitude faster than that.
Interestingly, we have some less fat tables (5-6 fields) that we also use LOAD DATA INFILE to batch data into and we see much better performance on those. The number of records is quite a bit smaller, which leads me to think that we are running up against a buffer size limit when we load the large table, and are having to go to disk (and really, what else but going to disk would explain such slow load times?).
...which brings me to my question. What my.cnf settings are most important when dealing with LOAD DATA INFILE commands?
I don't know about settings, but my money is on that composite primary key as to why you are having such poor performance.
Inserting into indexes in general kills performance. You may be better off removing the index before inserting data and re-indexing after insertion.
From http://forum.percona.com/s/m/983/:
Also check out http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/ and http://www.linuxtopia.org/online_books/database_guides/mysql_5.1_database_reference_guide/insert-speed.html.
If your Table is MyISam and if data is being added to a non empty table, bulk_insert_buffer_size matters
MyISAM uses a special tree-like cache to make bulk inserts faster for LOAD DATA INFILE when adding data to nonempty tables. The BULK_INSERT_BUFFER_SIZE variable limits the size of the cache tree in bytes per thread. Setting it to 0 disables this optimization. The default value is 8MB. The maximum value is 4GB.
If data is being added to a non-empty table, tuning the bulk_insert_buffer_size variable can make data insertion faster. Normally, it shows the improvement when the data to be inserted is more than 10k rows. But it is hard to say what is the right value, so, trail and try with incremental buffer size values.
Also, performance improvement will be more obvious if the following variables are also set:
MYISAM_SORT_BUFFER_SIZE
KEY_BUFFER_SIZE
You should also consider disabling the indexes before loading the data using the following alter table command: