I have two tables: Articles
that stores information about Articles, and PageLinks
that stores hyperlinks between pages. The schema is as below.
CREATE TABLE `Articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`label` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `PageLinks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_id` int(11) NOT NULL,
`to_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index4` (`to_id`,`from_id`),
KEY `fk_PageLinks_1` (`from_id`),
KEY `fk_PageLinks_2` (`to_id`),
CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
I have several million slug pairs that indicate hyperlinks between corresponding pages. I am trying to load the PageLinks table from these slug pairs.
Currently, i have a python program that issues select id
queries for each slug to convert the slug pairs to Article id pairs. The id pairs are then written to a file and loaded using load data infile
. Additionally if a slug does not exist in the Articles table, the program inserts a dummy row without label and then uses id of that row.
I am trying to optimize the program to load entries faster (I have around 18GB of slug pairs to load). I believe some speed up can be achieved if it is possible to do slug->id resolution and page link insertion together in bulk (thus avoiding the per SELECT overhead). What is the best possible way to do this in mysql?