For a mysql database, I have some XML dump files.
One table import file for a pricing
table looks like this:
<database>
<table>
<row>
<id>5954017</id>
<foo>narf</foo>
<bar_id>1377</bar_id>
<price_single>800.00</price_single>
<price_double>1500.00</price_double>
<price_triple>2000.00</price_triple>
<price_quad>1900.00</price_quad>
<currency>USD</currency>
</row>
...
</table>
</database>
It has
$ xmllint --xpath "count(//row)" import.xml
223198
rows and its size is:
du -h import.xml
69M import.xml
I want to import via mysql's LOAD XML
feature. The table will be always be truncated beforehand.
MySQL [my_database]> LOAD XML LOCAL INFILE 'import.xml' INTO TABLE pricing ROWS IDENTIFIED BY '<row>' \G
It succeeds, yet that table import takes seemingly a rather long time:
Query OK, 223198 rows affected (1 hour 44 min 48.40 sec)
Records: 223198 Deleted: 0 Skipped: 0 Warnings: 0
I was reading that people are importing gigabytes of data with this LOAD INFILE feature, and I expected its performance to be much faster, in the realms of minutes, not hours. Is my expectation wrong? Is this a normal time for such a dataset of 200,000 entries?
(I also am comparing this speed to a custom php import script that parses the XML manually and inserts each row one by one; and that tasks takes 45 minutes for all the tables. I expected the LOAD XML LOCAL INFILE
to outperform that task.)
My table looks like this:
MySQL [my_database]> DESCRIBE pricing;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| foo | varchar(256) | YES | | NULL | |
| bar_id | int(11) | YES | | NULL | |
| price_single | float | YES | | NULL | |
| price_double | float | YES | | NULL | |
| price_triple | float | YES | | NULL | |
| price_quad | float | YES | | NULL | |
| currency | varchar(3) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
And was created via:
DROP TABLE IF EXISTS `pricing`;
CREATE TABLE `pricing` (
`id` int(11) NOT NULL,
`foo` varchar(256) DEFAULT NULL,
`bar_id` int(11) DEFAULT NULL,
`price_single` float DEFAULT NULL,
`price_double` float DEFAULT NULL,
`price_triple` float DEFAULT NULL,
`price_quad` float DEFAULT NULL,
`currency` varchar(3) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
What can I do to improve the performance of LOAD XML LOCAL INFILE
?
Transform your XML into a CSV file making the import almost instant.
You can transform your XML via xslt using e.g.
xsltproc
:The appropriate xslt
transformToCsv.xsl
looks like this:The import syntax then changes to:
Importing 200,000 entries becomes a matter of (milli) seconds.