SELECT WHERE IN with GROUP_CONCAT as input

2019-06-09 03:49发布

问题:

Please read this before continuing: Filter an unfiltered table against a whitelist table

So, I currently have a whitelist table set up as shown in the referenced link, and I'm encountering yet another issue brought up by said table, that is, to check the UNIQUENESS of each column. As MySQL's specification, it is not possible to set NULL column as UNIQUE, so, I've decided to come up with a different solution to check if rows are duplicated or not by using a SELECT GROUP BY query as follows.

SELECT GROUP_CONCAT(ID) AS IDs, country, region, item, count(*) AS amount
FROM whitelist

Now, to check if the item is duplicated, I've warpped it on top of another layer.

SELECT IDs, country, region, item, amount
FROM (SELECT GROUP_CONCAT(ID) AS IDs, country, region, item, count(*) AS amount
      FROM whitelist) tmp
WHERE amount > 1

Still works fine as intended, but the question starts here.

Is it possible for me to use this data, and RE-SELECT the whitelist table so I can get each entry as a row with something like ...

SELECT ID, country, region, item
FROM whitelist
WHERE ID IN (SELECT group_concat(ID)
               FROM (SELECT group_concat(ID) AS ID, country, region, item, COUNT(*) AS AMOUNT
                       FROM whitelist
                      GROUP BY country, region, item) tmp
              WHERE AMOUNT > 1)

Of course, I could just use PHP and explode the group_concat IDs and re-select it, but I'm wondering if it's possible to do it in one SQL query call instead of two.

Edit: Oops, the example above had an error in it (accidentally used real schema there xD)

Edit2: Doh, I suddenly thought why complicate things and why not just simply go with this ...

SELECT wl1.ID, wl1.country, wl1.region, wl1.item, wl1.reason
  FROM whitelist wl1, 
       (SELECT country, region, item
          FROM whitelist
         GROUP BY country, region, item
        HAVING count(*) > 1) wl2
 WHERE wl1.country = wl2.country AND
       wl1.region = wl2.region AND
       wl1.item = wl2.reason

... but still fails too, because you cannot use = on two NULL columns. Urgh, so close yet so far >.<

To: Bill Karwin

That is exactly the issue here. If I set a unique key on country, region, item, and I perform the following SQL, this will happen.

INSERT INTO whitelist(country, region, item) VALUES ('Taiwan', 'Asia', 'PC');
INSERT INTO whitelist(country, region, item) VALUES ('Taiwan', 'Asia', 'PC');
-- Would fail due to UNIQUE check

However, if I include any of the wildcards, aka NULL, and this would happen.

INSERT INTO whitelist(country, region, item) VALUES (NULL, 'Asia', 'Rice');
INSERT INTO whitelist(country, region, item) VALUES (NULL, 'Asia', 'Rice');
-- Would succeed due to UNIQUE does not check NULL columns.

Hence the idea of this post is to list all repeating whitelist in a list so that the operator can decide what to keep and what to delete.

回答1:

Not keen on this solution, but viable:-

SELECT a.ID, 
        a.country, 
        a.region, 
        a.item
FROM whitelist a
INNER JOIN 
(
    SELECT group_concat(ID) AS ID, USERNAME, COMPNAME, PUBLISHER, NAME, VERSION, COUNT(*) AS AMOUNT
    FROM software_checklist
    GROUP BY USERNAME, COMPNAME, PUBLISHER, NAME, VERSION 
    HAVING AMOUNT > 1
) tmp
ON FIND_IN_SET(a.ID, tmp.ID)