Why do the results of this MySQL query get multipl

2020-06-06 01:50发布

问题:

SELECT user_id,
    SUM(COALESCE(point_points, 0)) AS total_points,
    SUM(
        CASE
            WHEN point_date > '$this_month'
            THEN point_points
            ELSE 0
        END)                AS month_points,
    COUNT(DISTINCT c_id)    AS num_comments,
    COUNT(DISTINCT rant_id) AS live_submissions
FROM users
    LEFT JOIN points
    ON  users.user_id = points.point_userid
    LEFT JOIN comments
    ON
        (
            c_userid = user_id
        )
    LEFT JOIN rants
    ON
        (
            rant_poster = user_id
        AND rant_status = 1
        )
WHERE user_id = $id
GROUP BY user_id

Basically live_submissions and num_comments variable display proper results, while the total_points and month_points display a product of month_points/total_points, live_submissions and num_comments. Any idea why this is happening?

回答1:

This is called a Cartesian Product. When you join the tables together, the default result is every permutation of rows for which the join conditions are true. You use JOIN conditions to limit these permutations.

But since you are joining multiple tables to users, the result includes every permutation of each matching table. For example, each matching row in points is repeated per matching row in comments, and each of these is multiplied again, repeating per matching row in rants.

You can partially compensate for this with COUNT(DISTINCT c_id) as you are doing, but the DISTINCT is necessary only because you have multiple rows per c_id. And it doesn't work unless you apply it to unique values. This remedy doesn't work for the SUM() expressions.

Basically, you're trying to do too many calculations in one query. You need to split it up into separate queries for it to be reliable. And then you can get rid of the DISTINCT modifiers, too.

SELECT u.user_id, SUM(COALESCE(p.point_points, 0)) AS total_points, 
  SUM( CASE WHEN p.point_date > '$this_month' THEN p.point_points ELSE 0 END ) AS month_points
FROM users u LEFT JOIN points p
  ON u.user_id = p.point_userid 
WHERE u.user_id = $id
GROUP BY u.user_id;

SELECT user_id, COUNT(c.c_id) as num_comments, 
FROM users u LEFT JOIN comments c
  ON (c.c_userid = u.user_id)
WHERE u.user_id = $id
GROUP BY u.user_id;

SELECT u.user_id, COUNT(r.rant_id) as live_submissions
FROM users u LEFT JOIN rants r
  ON (r.rant_poster = u.user_id AND r.rant_status = 1)
WHERE u.user_id = $id
GROUP BY u.user_id;

You shouldn't try to do all three of these in a single query.



回答2:

Can you provide some sample output?

I think it has something to do with adding rants and comments in with the points. Can you try to remove the rants and comments tables?



回答3:

If you look at the query output before grouping then you'll see the problem. Multiple rows will be returned for a user if they have more than 1 record in any of the joined tables. So if a user has 2 comment records then 2 point records will also be returned.

As a simplified example...

User Table

userId name

1 Fred

Point table

userId Points

1 10

Comments table

userId Comment

1 Here

1 There

Selecting * from these tables will result in

userId Points Comment

1 10 Here

1 10 There

I'm not entirely sure of the MYSQL syntax but you would want something like

SELECT UserId, C.num_comments, P.total_points
FROM users
LEFT JOIN 
   (SELECT c_userId, COUNT(DISTINCT c_id) as num_comments
    FROM Comments
    GROUP BY c_userId)
    AS C
    ON UserId = c_userid
LEFT JOIN 
   (SELECT point_userId, sum(COALESCE(point_points, 0)) as total_points
    FROM Points
    GROUP BY point_userId)
    AS P
    ON UserId = point_userid