mysql query select keys and insert

2019-08-14 05:35发布

问题:

I have two tables: Articles that stores information about Articles, and PageLinks that stores hyperlinks between pages. The schema is as below.

CREATE TABLE `Articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `label` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE `PageLinks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `from_id` int(11) NOT NULL,
  `to_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index4` (`to_id`,`from_id`),
  KEY `fk_PageLinks_1` (`from_id`),
  KEY `fk_PageLinks_2` (`to_id`),
  CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

I have several million slug pairs that indicate hyperlinks between corresponding pages. I am trying to load the PageLinks table from these slug pairs.

Currently, i have a python program that issues select id queries for each slug to convert the slug pairs to Article id pairs. The id pairs are then written to a file and loaded using load data infile. Additionally if a slug does not exist in the Articles table, the program inserts a dummy row without label and then uses id of that row.

I am trying to optimize the program to load entries faster (I have around 18GB of slug pairs to load). I believe some speed up can be achieved if it is possible to do slug->id resolution and page link insertion together in bulk (thus avoiding the per SELECT overhead). What is the best possible way to do this in mysql?

回答1:

Making a separate SELECT for each slug is inefficient indeed.

You should load your slug pairs into a table:

CREATE TABLE pairs
        (
        slug1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
        slug2 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
        );

, load it with your pairs, then issue the following statements:

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug1
FROM    pairs;

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug2
FROM    pairs;

INSERT
INTO    pairs (from_id, to_id)
SELECT  a1.id, a2.id
FROM    pairs
JOIN    articles a1
ON      a1.slug = slug1
JOIN    articles a2
ON      a2.slug = slug2;