I'm trying to create a Wikipedia DB copy (Around 50GB), but having problems with the largest SQL files.
I've split the files of size in GB using linux split utility into chunks of 300 MB. e.g.
split -d -l 50 ../enwiki-20070908-page page.input.
On average 300MB files take 3 hours at my server. I've ubuntu 12.04 server OS and Mysql 5.5 Server.
I'm trying like following:
mysql -u username -ppassword database < category.sql
Note: these files consist of Insert statements and these are not CSV files.
Wikipedia offers database dumps for download, so everybody can create a copy of Wikipedia.
You can find example files here: Wikipedia Dumps
I think the import is slow because of the settings for my MySQL Server, but I don't know what I should change. I'm using the standard Ubuntu MySQL config on a machine with a decent processor and 2GB RAM. Could someone help me out with a suitable configuration for my system?
I've tried to set innodb_buffer_pool_size to 1GB but no vains.
Since you have less than 50GB of memory (so you can't buffer the entire database in memory), the bottleneck is the write speed of your disk subsystem.
Tricks to speed up imports:
- MyISAM is not transactional, so much faster in single threaded inserts. Try to load into MyISAM, then
ALTER
the table to INNODB
- Use
ALTER TABLE .. DISABLE KEYS
to avoid index updates line by line (MyISAM only)
- Set
bulk_insert_buffer_size
above your insert size (MyISAM only)
- Set
unique_checks = 0
so that unique constrains are not checked.
For more, see Bulk Data Loading for InnoDB Tables in MySQL Manual.
Note: If the original table have foreign key constraints, using MyISAM as an intermediate format is a bad idea.
Use MyISAM, usually much faster than InnoDB, if your data base isnt transaction oriented. Did you research into using any table partitioning/sharding techniques?
Converting huge MyISAM into InnoDB will again run into performance issues, so I am not sure I would do that. But disabling and re-enabling keys could be of help...