Optimize massive MySQL INSERTs

2019-05-31 07:47发布

问题:

I've got an application which needs to run a daily script; the daily script consists in downloading a CSV file with 1,000,000 rows, and inserting those rows into a table.

I host my application in Dreamhost. I created a while loop that goes through all the CSV's rows and performs an INSERT query for each one. The thing is that I get a "500 Internal Server Error". Even if I chop it out in 1000 files with 1000 rows each, I can't insert more than 40 or 50 thousand rows in the same loop.

Is there any way that I could optimize the input? I'm also considering going with a dedicated server; what do you think?

Thanks!

Pedro

回答1:

Most databases have an optimized bulk insertion process - MySQL's is the LOAD DATA FILE syntax.

To load a CSV file, use:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;


回答2:

Insert multiple values, instead of doing

insert into table values(1,2);

do

insert into table values (1,2),(2,3),(4,5);

Up to an appropriate number of rows at a time.

Or do bulk import, which is the most efficient way of loading data, see

http://dev.mysql.com/doc/refman/5.0/en/load-data.html



回答3:

Normally I would say just use LOAD DATA INFILE, but it seems you can't with your shared hosting environment.

I haven't used MySQL in a few years, but they have a very good document which describes how to speed up insertions for bulk insertions: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

A few ideas that can be gleaned from this:

  • Disable/enable keys around the insertions:

    ALTER TABLE tbl_name DISABLE KEYS; ALTER TABLE tbl_name ENABLE KEYS;

  • Use many values in your insert statements.

    I.e.: INSERT INTO table (col1, col2) VALUES (val1, val2),(.., ..), ...

    If I recall correctly, you can have up to 4096 values per insertion statement.

  • Run a FLUSH TABLES command before you even start, to ensure that there are no pending disk writes that may hurt your insertion performance.

I think this will make things fast. I would suggest using LOCK TABLES, but I think disabling the keys makes that moot.

UPDATE

I realized after reading this that by disabling your keys you may remove consistency checks that are important for your file loading. You can fix this by:

  • Ensuring that your table has no data that "collides" with the new data being loaded (if you're starting from scratch, a TRUNCATE statement will be useful here).
  • Writing a script to clean your input data to ensure no duplicates locally. Checking for duplicates is probably costing you a lot of database time anyway.
  • If you do this, ENABLE KEYS should not fail.


回答4:

You can create cronjob script which adds x records to the database at one request. Cronjob script will check if last import have not addded all needed rows he takes another x rows.

So you can add as many you need rows.

If you have your dedicated server it's more easier. You just run loop with all insert queries.

Of course you can try to set time_limit to 0 (if it's working on dreamhost) or make it bigger.



回答5:

Your PHP script is most likely being terminated because it exceeded the script time limit. Since you're on a shared host, you're pretty much out of luck.

If you do switch to a dedicated server and if you get shell access, the best way would be to use the mysql command-line tool to insert the data.



回答6:

OMG Ponies suggestion is great, but I've also 'manually' formatted data into the same format that mysqldump uses, then loaded it that way. Very fast.



回答7:

Have you tried doing transactions? Just send the command BEGIN to MySQL, do all your inserts then do COMMIT. This would speed it up significantly,but like casablanca said, your script is probably timing out as well.



回答8:

I've ran into this problem myself before and nos pretty much got it right on the head, but you'll need to do a bit more to get it to perform the best.

I found that in my situation that I couldn't MySQL to accept one large INSERT statement, but found that if I split it up into groups of about 10k INSERTS at a time like how nos suggested then it'll do it's job pretty quickly. One thing to note is that when doing multiple INSERTs like this that you will most likely hit PHP's timeout limit, but this can be avoided by resetting the timout with set_time_limit($seconds), I found that doing this after each successful INSERT worked really well.

You have to be careful about doing this, because you could find yourself in a loop on accident with an unlimited timout and for that I would suggest testing to make sure that each INSERT was successful by either checking for errors reported by MySQL with mysql_errno() or mysql_error(). You could also catch errors by checking the number of rows affected by the INSERT with mysql_affected_rows(). You could then stop after the first error happens.



回答9:

It would be better if you use sqlloader. You would need two things first control file that specifies the actions which SQL Loader should do and second csv file that you want to be loaded Here is the below link that would help you out. http://www.oracle-dba-online.com/sql_loader.htm



回答10:

Go to phpmyadmin and select the table you would like to insert into.

Under the "operations" tab, and then the ' table options' option /section , change the storage engine from InnoDB to MyISAM.

I once had a similar challenge. Have a good time.