Why am I getting NULL values on Left Join?

2019-09-12 13:24发布

问题:

I'm migrating a database a new one to change One-to-Many relationships to Many-to-Many (and to improve the column naming scheme). [Edit: I've created a SQLFiddle for this.]

oldDB                           newDB
=========================       =======================================
individuals                     people
 - individual_id                 - id
 - individual_name_first         - first_name
 - individual_name_last          - last_name
 - individual_name_other         - additional_identifier
 - individual_position           - role
 - individual_group_code         - (replaced with people-groups table)
(There are duplicate rows
 in this table for individ's
 who are in more than one
 group.)

groups                          groups
 - (no id in oldDB)              - id
 - group_code                    - short_name
 - group_name                    - full_name

                                people_groups
                                 - id
                                 - person_id
                                 - group_id
                                 - start_date
                                 - end_date

Specifically, I'm having trouble with creating the linking table between people and groups.

I've already created the people and groups tables:

CREATE TABLE IF NOT EXISTS people (
  id int(11) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  first_name varchar(50) NOT NULL,
  last_name varchar(50) NOT NULL,
  additional_identifier varchar(50) DEFAULT NULL COMMENT 'In case of duplicate first and last names',
  role varchar(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE newDB.people ADD UNIQUE `name` (last_name, first_name, additional_identifier);

INSERT INTO newDB.people
  (id, first_name, last_name, role)
SELECT
  individual_id, individual_name_last, individual_name_first, individual_position, COUNT(*)
FROM
  oldDB.individuals
GROUP BY
  individual_name_last, individual_name_first;

CREATE TABLE IF NOT EXISTS newDB.groups(
    id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    full_name VARCHAR(255) NOT NULL UNIQUE,
    short_name VARCHAR(255) NOT NULL UNIQUE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO newDB.groups
  (full_name, short_name)
SELECT
  group_name, group_code
FROM
  oldDB.groups;

Next I will CREATE the newDB.people-groups table, but first I'm making sure I can SELECT the right values:

SELECT
  newDB.groups.id 'group id',
  newDB.people.id 'person id',
  individual_group_code 'group short name',
  individual_name_last 'last name',
  individual_name_first 'first name'
FROM
  oldDB.individuals
LEFT JOIN
  newDB.groups ON(
    newDB.groups.short_name = oldDB.individuals.individual_group_code
   )
LEFT JOIN newDB.people ON (
  newDB.people.last_name = oldDB.individuals.individual_name_last 
    AND
  newDB.people.first_name = oldDB.individuals.individual_name_first
)
GROUP BY
  individual_name_last ASC,
  individual_name_first ASC,
  individual_group_code

The first LEFT JOIN is just for displaying the group short names for easy verification. The second LEFT JOIN is important: it is supposed to allow for the 'person id' output to be pulled from newDB.people.id. Instead, I'm just getting NULL in that column for all rows of output. Everything else is displaying correctly. What am I missing?

Edit:

Desired output

Here is what I am hoping to get. (I generated it by replacing newDB.people.id 'person id' with oldDB.individuals.individual_id 'person id'. To exemplify the problem with it is that person 925 and 1232 are the same person in two different groups. The new database simplifies that to having person 925.)

Actual output

Here is what I am getting:

Edit 2:

Here is a SQLFiddle that does work. Why doesn't it work in my phpmyadmin?

回答1:

You are performing a group by (3 columns) with 5 columns of non-aggregated columns in the select list. Also, not that it matters, there are no aggregates in the column output.

MySQL treats that as a distinct (for the 3 columns) and brings back the first row it encounters in the MRU cache, and if no cache, the first ones encountered in the clustered index or physical ordering to satisfy the 2 non-grouped-by columns.

In other words, it is a user error. A snafu. I recommend cleaning up you intention with the GROUP BY.

Somewhat related, read a recent answer of mine Here related to ONLY_FULL_GROUP_BY. See at the bottom of that link is MySQL Handling of GROUP BY which in my view is a gloss over of the real problems and non-standards that MySQL allowed which rendered unexpected and hard to explain data from violations of that Standard.

So what did the MySQL dev team do? They implemented the standard by default (starting in version 5.7) to disallow the types of queries you just performed.

Edit1

Your query, with no GROUP BY but with an order by newGroups.id,people.id, on a version 5.7.14 server: