Which PDO SQL Query is faster in the long run and

2019-07-29 00:57发布

问题:

From a Table has over a million record, When i pull the data from it,

I want to check if the requested data exists or not, So which path is more efficient and faster then the other?

$Query = '
    SELECT n.id 
    FROM names n 
    INNER JOIN ages a ON n.id = a.aid 
    INNER JOIN regions r ON n.id = r.rid 
    WHERE id = :id
';

$stmt->prepare($Query);
$stmt->execute(['id' => $id]);
if ($stmt->rowCount() == 1) {
    $row = $stmt->fetch();
    ......................
} else {
    exit();
}

or

$EXISTS = 'SELECT EXISTS (
    SELECT n.fname, n.lname, a.age, r.region 
    FROM names n 
    INNER JOIN ages a ON n.id = a.aid 
    INNER JOIN regions r ON n.id = r.rid 
    WHERE id = :id
    LIMIT 1
)
';
$stmt->prepare($EXISTS);
$stmt->execute(['id' => $id]);
if ($stmt->fetchColumn() == 1) {
    $stmt->prepare($Query);
    $stmt->execute(['id' => $id]);
    $row = $stmt->fetch();
    ......................
} else {
    exit();
}

keeping in mind that id is PRIMARY (INT) and aid, rid are INDEXED (INT)

回答1:

The two methods you show are almost certainly equivalent, with almost no measurable difference in performance.

SELECT n.id 
FROM names n 
INNER JOIN ages a ON n.id = a.aid 
INNER JOIN regions r ON n.id = r.rid 
WHERE id = :id

I assume names.id is the primary key of that table. A primary key lookup is very fast.

Then it will do a secondary key lookup to the other two tables, and it will be an index-only access because there's no reference to other columns of those tables.

You should learn how to use EXPLAIN to analyze MySQL's optimization plan. This is a skill you should practice any time you want to improve the performance of an SQL query.

See https://dev.mysql.com/doc/refman/5.7/en/using-explain.html

mysql> explain SELECT n.id 
    ->     FROM names n 
    ->     INNER JOIN ages a ON n.id = a.aid 
    ->     INNER JOIN regions r ON n.id = r.rid 
    ->     WHERE id = 1;

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | n     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | Using index |
|  1 | SIMPLE      | a     | NULL       | ref   | aid           | aid     | 5       | const |    1 |   100.00 | Using index |
|  1 | SIMPLE      | r     | NULL       | ref   | rid           | rid     | 5       | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

We see that each table access is using an index (I'm assuming indexes though you did not provide your SHOW CREATE TABLE in your question).

Compare to the second solution with SELECT EXISTS(...)

mysql> explain SELECT EXISTS (
    ->     SELECT n.id 
    ->     FROM names n 
    ->     INNER JOIN ages a ON n.id = a.aid 
    ->     INNER JOIN regions r ON n.id = r.rid 
    ->     WHERE id = 1 
    ->     LIMIT 1);

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra          |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+
|  1 | PRIMARY     | NULL  | NULL       | NULL  | NULL          | NULL    | NULL    | NULL  | NULL |     NULL | No tables used |
|  2 | SUBQUERY    | n     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | Using index    |
|  2 | SUBQUERY    | a     | NULL       | ref   | aid           | aid     | 5       | const |    1 |   100.00 | Using index    |
|  2 | SUBQUERY    | r     | NULL       | ref   | rid           | rid     | 5       | const |    1 |   100.00 | Using index    |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+----------------+

The subquery looks identical to the first query optimization plan; it still uses indexes in the same way. But it's relegated to a subquery. Probably not a big difference, but it's one more thing.

The only advantage is that the SELECT EXISTS... query is guaranteed to return just one row with a true/false value. The first query might return a result set with zero, one, or many rows, depending how many matched the JOINs in the query. The difference is not a performance difference (unless it returns so many rows that it takes time to transfer the result set to the client, or uses a lot of memory to hold the result set in the client), but just a matter of convenience for the way you code it.



回答2:

Don't normalize ages; it is just a waste of space and time. age (assuming it is 'years') can fit in a 1-byte TINYINT UNSIGNED (range: 0..255) and avoid the JOIN lookup. Note that aid seems to be a 4-byte INT, which can hold billions of different values -- do you have billions of different ages?

Perhaps changing regions is worth it also.

In the first query, the two JOINs do nothing but verify that there are rows in age and regions. That is probably a waste.

EXISTS stops when one row is found. So LIMIT 1 is very unnecessary.