JOIN ERROR in big query SELECT statement: USING OR

2019-09-02 12:03发布

问题:

I have the below select statement. I am using a LEFT JOIN to push two tables together. There are two conditions where the LEFT JOIN should work:

Condition 1: ATTOM_ID. ATTOM ID is the unique Identifier
Condition 2: ZIP, LAST NAME, and FULL ADDRESS. These fields are string fields and all three must match to be a JOIN.

Any other condition should result in a NULL, hence the LEFT JOIN. If either condition passes, the JOIN should occur, which is why I wanted an OR statement here.

for some reason Google Big Query does not like the query because I have an OR in it. The error I get is:

LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

Here is the SQL Statement. All other issues on this statement work. Is there a GBQ limitation on using LEFT JOINS and the "OR" operator? Thanks.

  SELECT 
  Source, 
  FirstName, 
  LastName,
  MiddleName, 
  Gender, 
  Age, 
  DOB, 
  Address, 
  Address2,
  City, 
  State, 
  Zip, 
  Zip4, 
  TimeZone, 
  Income, 
  HomeValue, 
  Networth, 
  MaritalStatus, 
  IsRenter, 
  HasChildren, 
  CreditRating, 
  Investor, 
  LinesOfCredit, 
  InvestorRealEstate, 
  Traveler, 
  Pets, 
  MailResponder, 
  Charitable, 
  PolicalDonations, 
  PoliticalParty, 
  coalesce(P.ATTOM_ID, T.ATTOM_ID) as ATTOM_ID,
  coalesce(P.GEOID, T.GEOID) as GEOID,
  Score,
  Score1,
  Score2,
  Score3,
  Score4,
  Score5,
  PropertyLatitude AS Latitude,
  PropertyLongitude AS Longitude
  FROM `db.ds.table1` P
 LEFT JOIN `db.ds.table2` T
ON 1 = 
CASE 
    WHEN (P.ATTOM_ID = T.ATTOM_ID) 
        THEN 1
    WHEN P.Zip = T. PropertyAddressZIP
            AND ( 
                    LOWER(P.LastName) = LOWER(T.DeedOwner1NameLast)
                    OR LOWER(P.LastName) = LOWER(T.PartyOwner1NameLast)
                )
            AND ( 
                    STRPOS(LOWER(P.Address), LOWER(T.PropertyAddressFull) ) > 0
                    OR STRPOS(LOWER(T.PropertyAddressFull), LOWER(P.Address) ) > 0 
                )
            AND IFNULL(T.PropertyAddressFull,'') != ''
        THEN 1
    ELSE 0 END

回答1:

Maybe a different approach? Can you split the OR in your JOIN condition, create two INNER JOIN queries and UNION them together?

SELECT 
    Source, 
    FirstName, 
    LastName,
    MiddleName, 
    Gender, 
    Age, 
    DOB, 
    Address, 
    Address2,
    City, 
    State, 
    Zip, 
    Zip4, 
    TimeZone, 
    Income, 
    HomeValue, 
    Networth, 
    MaritalStatus, 
    IsRenter, 
    HasChildren, 
    CreditRating, 
    Investor, 
    LinesOfCredit, 
    InvestorRealEstate, 
    Traveler, 
    Pets, 
    MailResponder, 
    Charitable, 
    PolicalDonations, 
    PoliticalParty, 
    coalesce(P.ATTOM_ID, T.ATTOM_ID) as ATTOM_ID,
    coalesce(P.GEOID, T.GEOID) as GEOID,
    Score,
    Score1,
    Score2,
    Score3,
    Score4,
    Score5,
    PropertyLatitude AS Latitude,
    PropertyLongitude AS Longitude
FROM `db.ds.Table1` P
INNER JOIN `db.ds.Table2` T ON (P.ATTOM_ID = T.ATTOM_ID) 

UNION

SELECT 
    Source, 
    FirstName, 
    LastName,
    MiddleName, 
    Gender, 
    Age, 
    DOB, 
    Address, 
    Address2,
    City, 
    State, 
    Zip, 
    Zip4, 
    TimeZone, 
    Income, 
    HomeValue, 
    Networth, 
    MaritalStatus, 
    IsRenter, 
    HasChildren, 
    CreditRating, 
    Investor, 
    LinesOfCredit, 
    InvestorRealEstate, 
    Traveler, 
    Pets, 
    MailResponder, 
    Charitable, 
    PolicalDonations, 
    PoliticalParty, 
    coalesce(P.ATTOM_ID, T.ATTOM_ID) as ATTOM_ID,
    coalesce(P.GEOID, T.GEOID) as GEOID,
    Score,
    Score1,
    Score2,
    Score3,
    Score4,
    Score5,
    PropertyLatitude AS Latitude,
    PropertyLongitude AS Longitude
FROM `db.ds.Table1` P
INNER JOIN `db.ds.Table2` T ON P.Zip = T. PropertyAddressZIP
            AND ( 
                    LOWER(P.LastName) = LOWER(T.DeedOwner1NameLast)
                    OR LOWER(P.LastName) = LOWER(T.PartyOwner1NameLast)
                )
            AND ( 
                    STRPOS(LOWER(P.Address), LOWER(T.PropertyAddressFull) ) > 0
                    OR STRPOS(LOWER(T.PropertyAddressFull), LOWER(P.Address) ) > 0 
                )
            AND IFNULL(T.PropertyAddressFull,'') != ''


回答2:

Below is simplified example of your issue

#standardSQL
WITH `db.ds.Table1` AS (
  SELECT NULL id, '12345' zip, 'abc' name UNION ALL
  SELECT 2, '23456', 'vwu' UNION ALL
  SELECT 4 id, '12347' zip, 'abd' name 
), `db.ds.Table2` AS (
  SELECT 2 id, '12346' zip, 'xyz' name UNION ALL
  SELECT 3, '12345' zip, 'abc' name 
)
SELECT p, t FROM `db.ds.Table1` p
LEFT JOIN `db.ds.Table2` t
ON p.id = t.id OR p.zip = t.zip   

it produces Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

You can rewrite above as

#standardSQL
WITH `db.ds.Table1` AS (
  SELECT NULL id, '12345' zip, 'abc' name UNION ALL
  SELECT 2, '23456', 'vwu' UNION ALL
  SELECT 4 id, '12347' zip, 'abd' name 
), `db.ds.Table2` AS (
  SELECT 2 id, '12346' zip, 'xyz' name UNION ALL
  SELECT 3, '12345' zip, 'abc' name 
)
SELECT 
  COALESCE(p.id, t.id) AS id,
  p.zip,
  p.name
FROM (
  SELECT ANY_VALUE(p) p , ANY_VALUE(IF(p.id = t.id OR p.zip = t.zip, t, NULL)) t
  FROM `db.ds.Table1` p
  CROSS JOIN `db.ds.Table2` t
  GROUP BY TO_JSON_STRING(p)
)   

here you are simply moving all your conditions inside IF() function and replace LEFT JOIN with CROSS JOIN

result is

Row id  zip     name     
1   3   12345   abc  
2   2   23456   vwu  
3   4   12347   abd    

as you can see - you are not missing id=4 from Table1

Hope, you can adopt this to your specific query (should be straight forward copy-pasting)



回答3:

Try moving the last part of the join condition to a WHERE clause:

SELECT 
  Source, 
  <lots_of_columns>
  FROM `db.ds.Table1` P
 LEFT JOIN `db.ds.Table2` T
 ON (P.ATTOM_ID = T.ATTOM_ID)
 OR (
   P.Zip = T. PropertyAddressZIP
  AND ( LOWER(P.LastName) = LOWER(T.DeedOwner1NameLast)
    OR LOWER(P.LastName) = LOWER(T.PartyOwner1NameLast))
  AND ( STRPOS(LOWER(P.Address), LOWER(T.PropertyAddressFull) ) > 0
    OR STRPOS(LOWER(T.PropertyAddressFull), LOWER(P.Address) ) > 0 )
  )
  WHERE IFNULL(T.PropertyAddressFull,'') != '';

That may be where the issue is coming from, since it doesn't reference both tables in the join.



回答4:

Maybe try using a CASE as the join clause?

SELECT 
    Source, 
    FirstName, 
    LastName,
    MiddleName, 
    Gender, 
    Age, 
    DOB, 
    Address, 
    Address2,
    City, 
    State, 
    Zip, 
    Zip4, 
    TimeZone, 
    Income, 
    HomeValue, 
    Networth, 
    MaritalStatus, 
    IsRenter, 
    HasChildren, 
    CreditRating, 
    Investor, 
    LinesOfCredit, 
    InvestorRealEstate, 
    Traveler, 
    Pets, 
    MailResponder, 
    Charitable, 
    PolicalDonations, 
    PoliticalParty, 
    coalesce(P.ATTOM_ID, T.ATTOM_ID) as ATTOM_ID,
    coalesce(P.GEOID, T.GEOID) as GEOID,
    Score,
    Score1,
    Score2,
    Score3,
    Score4,
    Score5,
    PropertyLatitude AS Latitude,
    PropertyLongitude AS Longitude
FROM `db.ds.Table1` P
LEFT JOIN `db.ds.Table2` T ON 1 = 
CASE 
    WHEN (P.ATTOM_ID = T.ATTOM_ID) 
        THEN 1
    WHEN P.Zip = T. PropertyAddressZIP
            AND ( 
                    LOWER(P.LastName) = LOWER(T.DeedOwner1NameLast)
                    OR LOWER(P.LastName) = LOWER(T.PartyOwner1NameLast)
                )
            AND ( 
                    STRPOS(LOWER(P.Address), LOWER(T.PropertyAddressFull) ) > 0
                    OR STRPOS(LOWER(T.PropertyAddressFull), LOWER(P.Address) ) > 0 
                )
            AND IFNULL(T.PropertyAddressFull,'') != ''
        THEN 1
    ELSE 0 END