Multi Join in a single SQL query

2019-07-20 20:59发布

问题:

Below is the data in TestingTable1 sorted by date in descending order always

BUYER_ID  |   ITEM_ID       |  CREATED_TIME
----------+-----------------+----------------------
1345653      110909316904     2012-07-09 21:29:06
1345653      151851771618     2012-07-09 19:57:33
1345653      221065796761     2012-07-09 19:31:48
1345653      400307563710     2012-07-09 18:57:33

And if this is the below data in TestingTable2 sorted by date in descending order always

USER_ID  |   PRODUCT_ID    |    LAST_TIME
---------+----------------+-----------------------
1345653     110909316904      2012-07-09 22:29:06
1345653     150851771618      2012-07-09 19:57:33    

Each row in TestingTable1 should match with TestingTable2, if doesn't match or data is not there in TestingTable2, then I need to show them in the output as in TestingTable1 I have this data but corresponding to TestingTable2 I have this data(which will be wrong data), so that I can see what is mismatch and what data is missing.

I need to compare TestingTable2 with TestingTable1 on BUYER_ID and USER_ID. I need to see, if BUYER_ID and USER_ID gets matched then I need to compare ITEM_ID with PRODUCT_ID and CREATED_TIME with LAST_TIME and if there is a mismatch in TestingTable2 after comparing with TestingTable1 in either one of them or both of them, then I need to show the result.

So if you look at the above example- I have three scenarios basically

  1. Firstly- In TestingTable1, in the First row ITEM_ID is matching with PRODUCT_ID in the First row of TestingTable2 but CREATED_TIME is not matching with LAST_TIME for the first row in both the tables
  2. Secondly- In TestingTable1, in the Second row CREATED_TIME is matching with LAST_TIME in the second row of TestingTable2 but ITEM_ID is not matching with PRODUCT_ID for the second row in both the tables
  3. Thirdly- In TestingTable1, last two lines(rows) are not there at all in TestingTable2. This scenario is not covered in my query that I wrote. I want this scenario also in my query.

So these are the three cases that I need to cover while comparing TestingTable2 with TestingTable1 always. And TestingTable1 is the MAIN table through which comparisons need to be made always, so it means data in TestingTable1 is always accurate.

So I need to show the result like below considering the above example if not matching either one of them or data is not there in TestingTable2 at all- TestingTable1 data then next to it same TestingTable2 data, so that I can see what value was there in TestingTable1 as compared to TestingTable2

BUYER_ID   |   ITEM_ID       |    CREATED_TIME           |      USER_ID   |     PRODUCT_ID     |     LAST_TIME   
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653      110909316904       2012-07-09 21:29:06            1345653          110909316904      2012-07-09 22:29:06
1345653      151851771618       2012-07-09 19:57:33            1345653          150851771618      2012-07-09 19:57:33
1345653      221065796761       2012-07-09 19:31:48            NULL             NULL              NULL
1345653      400307563710       2012-07-09 18:57:33            NULL             NULL              NULL

Below is the query I wrote that covers only my two scenarios that I mentioned above and it works fine and I will get the output like above leaving last two rows from the Output. But I need to add third scenario also in this(below) query so that it gives output like above.

SELECT *
FROM   (SELECT T2.buyer_id,
               T2.item_id,
               T2.created_time AS created_time,
               subq.user_id,
               subq.product_id,
               subq.LAST_TIME
        FROM   TestingTable2 subq
               JOIN TestingTable1 T2
                 ON T2.buyer_id = subq.user_id
                    AND subq.LAST_TIME = ( T2.created_time )
        WHERE  ( subq.product_id <> T2.item_id )
        UNION ALL
        SELECT T2.buyer_id,
               T2.item_id AS item_id,
               T2.created_time,
               subq.user_id,
               subq.product_id AS product_id,
               subq.LAST_TIME
        FROM   TestingTable2 subq
               JOIN TestingTable1 T2
                 ON T2.buyer_id = subq.user_id
                    AND subq.product_id = T2.item_id
        WHERE  ( subq.LAST_TIME <> ( T2.created_time ) )) finalResult
ORDER  BY finalResult.BUYER_ID;

Any suggestion will be appreciated.

P.S. I have asked few questions related to JOIN in past few days, but that covers only my two scenarios, not the third scenario that I need in this query.

Update:- I cannot use NOT IN or NOT EXISTS syntax for SQL, as I am working with Hive and Hive doesn't support NOT IN or NOT EXISTS that so I need some other way to deal with this.

I need to use my query only to modify it to work for Third Scenario, as that way Hive will support SQL syntax.

Below is my SQL Fiddle, that fulfills my two scenario above but not the third scenario. Can anyone help me to modify my query to work for third scenario as well?

http://sqlfiddle.com/#!3/102dd/1/0.

Data in Table1 should be there in Table2, if it is not there, then I need to show the mismatch between data after comparing from Table1 and also it is possible, data from Table1 will not be there in Table2, and I want to show that too.

Updated Output for the fiddle http://sqlfiddle.com/#!3/102dd/3/0

BUYER_ID   |   ITEM_ID       |    CREATED_TIME           |      USER_ID   |     PRODUCT_ID     |     LAST_TIME   
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653       151851771618      July, 09 2012 19:57:33      1345653            150851771618         July, 09 2012 19:57:33
1345653       221065796761      July, 09 2012 19:31:48      1345653            221165796761         July, 09 2012 19:31:48
1345653       110909316904      July, 09 2012 21:29:06      1345653            110909316904         July, 09 2012 22:29:06
1345653       400307563710      July, 09 2012 18:57:33      NULL               NULL                 NULL
1345653       310411560125      July, 09 2012 16:09:49      NULL               NULL                 NULL

UPDATED SQL QUERY THAT IS GIVING ME ERROR

I replaced TestingTable1 with this query-

(SELECT BUYER_ID, ITEM_ID, rank(BUYER_ID), CREATED_TIME
FROM (
    SELECT BUYER_ID, ITEM_ID, CREATED_TIME
    FROM testingtable1
    where to_date(from_unixtime(cast(UNIX_TIMESTAMP(CREATED_TIME) as int))) = '2012-07-09'
    DISTRIBUTE BY BUYER_ID
    SORT BY BUYER_ID, CREATED_TIME desc
) T1
WHERE rank(BUYER_ID) < 5)

And TestingTable2 with this query-

(SELECT USER_ID, PROD_AND_TS.PRODUCT_ID as PRODUCT_ID, PROD_AND_TS.TIMESTAMPS as TIMESTAMPS FROM testingtable2 lateral view explode(PURCHASED_ITEM) exploded_table as PROD_AND_TS where to_date(from_unixtime(cast(PROD_AND_TS.TIMESTAMPS as BIGINT))) = '2012-07-09')

回答1:

If you can't use NOT IN or the clean solution posted by @eggyal because of DBMS restrictions, another option might be to completely duplicate your original union and use those results in a LEFT JOIN.

Applied to your sqlFiddle, following statement returns the required results.

SQL Statement

SELECT * 
FROM(
    SELECT *
    FROM TestingTable1 A
    JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME 
    WHERE B.PRODUCT_ID <> A.ITEM_ID
    UNION ALL
    SELECT * 
    FROM TestingTable1 A
    INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID  
    WHERE B.LAST_TIME <> A.Created_TIME      
 ) X
UNION ALL
SELECT A.*, NULL, NULL, NULL
FROM   TestingTable1 A
       LEFT OUTER JOIN (
            SELECT *
            FROM TestingTable1 A
            JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME 
            WHERE B.PRODUCT_ID <> A.ITEM_ID
            UNION ALL
            SELECT * 
            FROM TestingTable1 A
            INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID  
            WHERE B.LAST_TIME <> A.Created_TIME      
       ) X ON A.BUYER_ID = X.BUYER_ID AND A.ITEM_ID = X.ITEM_ID
WHERE  X.BUYER_ID IS NULL


回答2:

Try this piece of code, written in SQL. I tested this on SQL Fiddle already.

 SELECT 
 tt1.buyer_id,tt1.item_id,tt1.created_time,
 tt2.user_id,tt2.product_id,tt2.last_time
 FROM 
 testingtable1 tt1 LEFT OUTER JOIN
 testingtable2 tt2 ON
 tt1.buyer_id = tt2.user_id
 AND 
 tt1.item_id = tt2.product_id
 AND
 tt1.created_time = tt2.last_time


回答3:

This answer is in response to your comment request to do so here: https://stackoverflow.com/a/11440651/1166147.
I have actually given you the answer to this in 2 of your other duplicates that creates the exact output you are display. It is the first query I wrote here: https://stackoverflow.com/a/11440539/1166147 and it is mentioned and explained twice here: https://stackoverflow.com/a/11398990/1166147

I modified your query back to mine and ran it in your SQLFiddle:

Your requirements have evolved though, and it looks like you could now just do a LEFT JOIN if TestingTable1 really is always accurate data.

SELECT *
FROM TestingTable1 A
LEFT JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND (B.LAST_TIME = A.Created_TIME  OR B.PRODUCT_ID = A.ITEM_ID)

EDIT FOR COMMENT

MOCK SCENARIOS

SCENARIO||       TABLE1           ||           TABLE2
----------------------------------------------------------
SCENARIO||  BUYER  ITEM      TIME ||  USER   PRODUCT  TIME
1       ||    1     A        09   ||   1     Z        09
2       ||    1     B        10   ||   NO RECORD IN TABLE 2
3       ||    1     C        10   ||   1     C        02
4       ||    1     D        12   ||   NO RECORD IN TABLE 2
5       ||    1     E        01   ||   1     E        01
6       ||   NO RECORD IN TABLE 1 ||   1     Y        05      

You are asking for a SQL solution - but it isn't really a SQL question. You rule out the valid SQL answers given to you because you can't use them in your environment, then reopen the same question again. This is a HQL/HIVE question. Yes, there are some similarities, but there are some differences.

It looks like HQL would support something like this, though I have no way to test and SQLFiddle isn't valid for testing this either. You will have to work out how to do ISNULL or COALESCE in HQL, as shown on the comments beside the select *. This would 'merge' the results and return the value from whichever wasn't null if there was a non null value there. I do believe HQL supports ISNULL:

select * --BUYER_ID, isNull(B.USER_ID,C.USER_ID)
from
(select BUYER_ID,ITEM_ID ,Created_TIME  from TestingTable1) a
left join
(SELECT USER_ID,PRODUCT_ID, last_time
  FROM TestingTable2 ) b on(a.BUYER_ID = b.user_id and B.last_time =   A.Created_TIME)
left join
(SELECT USER_ID,PRODUCT_ID, last_time
  FROM TestingTable2 ) c on(a.BUYER_ID = c.user_id and c.PRODUCT_ID = A.ITEM_ID)

Here is yet another TSQL answer that may give you some ideas. I'd never use this in TSQL but it would actually work.

SELECT A.BUYER_ID,A.ITEM_ID,CREATED_TIME,COALESCE(B.USER_ID,X.USER_ID,Y.USER_ID),
COALESCE(B.PRODUCT_ID,X.PRODUCT_ID,Y.PRODUCT_ID)
,COALESCE(B.last_time,X.last_time,Y.last_time)
FROM TestingTable1 A
LEFT JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
AND B.last_time = A.Created_TIME
LEFT JOIN( SELECT USER_ID,PRODUCT_ID, last_time
  FROM TestingTable1 A
  LEFT JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.last_time = A.Created_TIME 
  WHERE  ISNULL(B.PRODUCT_ID,0) <> A.ITEM_ID AND B.USER_ID IS NOT NULL) X ON
  X.USER_ID = A.BUYER_ID AND A.Created_TIME = X.last_time
LEFT JOIN( SELECT USER_ID,PRODUCT_ID, last_time
FROM TestingTable1 A
LEFT JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND ISNULL( B.PRODUCT_ID,0) =   A.ITEM_ID  
WHERE  B.last_time <> A.Created_TIME AND B.USER_ID IS NOT NULL) Y ON
A.BUYER_ID = Y.USER_ID AND A.ITEM_ID = Y.PRODUCT_ID


回答4:

It sounds like what you are looking for is a Full outer join. I didn't see in your post what type of database you are using so I can't post the exact syntax, but this link may point you in the right direction:

http://www.w3resource.com/sql/joins/perform-a-full-outer-join.php

Specifically look at the diagram at the bottom.