MySQL doubled results using sum and left join

2019-08-23 01:01发布

问题:

I have a three table setup : kids, toys and games, each with unique primary keys : id_kid, id_toy and id_game. Each kid can have multiple toys and games, but each toy or game is owned by only one kid.

The toys and games have a bought column with 3 states : -1,0,1 The table structure is something like this :

kids

id_kid
kid_name
etc

games

id_game
id_kid_games --> links with id_kid in kids_table (maybe not the best name, I know)
game_name
bought --> can be -1,0,1

toys

id_toy
id_kid_toys --> links with id_kid in kids_table
toy_name
bought --> can be -1,0,1

For each kid i'm trying to get a total of toys and games, bought and not bought, using the query below, however the results are doubled :

SELECT kids.*, 

COUNT(DISTINCT toys.id_toy) AS total_toys, 
SUM(CASE toys.bought WHEN 1 THEN 1 ELSE 0 END) AS toys_bought, 
SUM(CASE toys.bought WHEN -1 THEN 1 ELSE 0 END) AS toys_not_bought, 

COUNT(DISTINCT games.id_game) AS total_games, 
SUM(CASE games.bought WHEN 1 THEN 1 ELSE 0 END) AS games_bought, 
SUM(CASE games.bought WHEN -1 THEN 1 ELSE 0 END) AS games_not_bought 

FROM kids as k 
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games
GROUP BY k.id_kid
ORDER BY k.name ASC

One kid has 2 toys and 4 games, all bought, and the results are 2 total toys (correct), 4 total games (correct), 8 toys bought, 8 games bought. (both wrong)

Please help with an answer - if possible - without using subselects. Thank you.

回答1:

As you are selecting data from two unrelated relations (kids joined to toys, and kids joined to games), subqueries are the natural way of doing it. As uncorrelated subqueries may be used, this should not be particularly slow.

Try if this query is sufficiently efficient:

Compared to your original query, it basically just reverses the order of joinining and grouping.

SELECT kids.*, t.total_toys, t.toys_bought, t.toys_not_bought,
               g.total_games, g.games_bought, g.games_not_bought
FROM kids
LEFT JOIN (SELECT id_kids_toys,
                  COUNT(*) AS total_toys,
                  SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as toys_bought,
                  SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as toys_not_bought
           FROM toys
           GROUP BY id_kids_toys) AS t
ON t.id_kids_toys = kids.id_kid
LEFT JOIN (SELECT id_kids_games,
                  COUNT(*) AS total_games,
                  SUM(CASE bought WHEN 1 THEN 1 ELSE 0 END) as games_bought,
                  SUM(CASE bought WHEN -1 THEN 1 ELSE 0 END) as games_not_bought
           FROM games
           GROUP BY id_kids_games) AS g
ON g.id_kids_games = kids.id_kid
ORDER by kids.name;

If you insist on avoiding subqueries, this, probably far less efficient, query might do:

SELECT kids.*, 
COUNT(DISTINCT toys.id_toy) AS total_toys, 

-- sum only toys joined to first game
SUM(IF(g2.id_game IS NULL AND bought = 1, 1, 0)) AS toys_bought, 
SUM(IF(g2.id_game IS NULL AND bought = -1, 1, 0)) AS toys_not_bought, 

-- sum only games joined to first toy
COUNT(DISTINCT games.id_game) AS total_games, 
SUM(IF(t2.id_toy IS NULL AND bought = 1, 1, 0)) AS games_bought, 
SUM(IF(t2.id_toy IS NULL AND bought = -1, 1, 0)) AS games_not_bought 

FROM kids as k 
LEFT JOIN toys t ON k.id_kid = t.id_kid_toys
LEFT JOIN games g ON k.id_kid = g.id_kid_games

-- select only rows where either game or toy is the first one for this kid
LEFT JOIN toys t2 on k.id_kid = t.id_kid_toys AND t2.id_toy < t.id_toy
LEFT JOIN games g2 ON k.id_kid = g.id_kid_games AND g2.id_game < g.id_game 
WHERE t2.id_toy IS NULL OR g2.id_game IS NULL

GROUP BY k.id_kid
ORDER BY k.name ASC

It works by ensuring that for each kid, only the games joined to the first toy is counted, and only the toys joined to the first game is counted.