I need a list of users in one database that are not listed as the new_user_id
in another. There are 112,815 matching users in both databases; user_id
is the key in all queries tables.
Query #1 works, and gives me 111,327 users who are NOT referenced as a new_user_Id. But it requires querying the same data twice.
-- 111,327 GSU users are NOT listed as a CSS new user
-- 1,488 GSU users ARE listed as a new user in CSS
--
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id, cud.new_user_id, cud.user_type_code
from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id
where gup.user_id not in (select cud.new_user_id
from css.user_desc cud
where cud.new_user_id is not null);
Query #2 would be perfect... and I'm actually surprised that it's syntactically accepted. But it gives me a result that makes no sense.
-- This gives me 1,505 users... I've checked, and they are not
-- referenced as new_user_ids in CSS, but I don't know why the ones
-- that were excluded were excluded.
--
-- Where are the missing 109,822, and whatexcluded them?
--
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id, cud.new_user_id, cud.user_type_code
from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id
where gup.user_id not in (cudsubq.new_user_id);
What exactly is the where clause in the second query doing, and why is it excluding 109,822 records from the results?
Note The above query is a simplification of what I'm really after. There are other/better ways to do the above queries... they're just representative of the part of the query that's giving me problems.
You want
new_user_id
's from tablegup
that don't match anynew_user_id
on tablecud
, right? It sounds like a job for a left join:The join keeps all rows of
gup
, matching them with anew_user_id
if possible. The WHERE condition keeps only the rows that have no matching row incud
.(Apologies if you know this already and you're only interested in the behavior of the
not in
query)Your second select compares gup.user_id with cud.new_user_id on current joining record. You can rewrite the query to get the same result
You mentioned you compare list of user in one database with a list of users in another. So you need to query data twice and you don't query the same data. Maybe you can use "minus" operator to avoid using "in"
Second query is semantically different. In this case
cudsubq.new_user_id
is treated as expression (doc: IN condition), not as a subquery, thus the whole clause is basically equivalent toSo, in your first query, you're literally asking "show me all users in GUP, who also have entries in CSS and their GUP.ID is not matching ANY NOT NULL NEW_ID in CSS ".
However, the second query is "show me all users in GUP, who also have entries in CSS and their GUP.ID is not equal to their RESPECTIVE NULLABLE (no
is not null
clause, remember?) CSS.NEW_ID value".And any
(not) in
(or equality/inequality) checks with nulls don't actually work.This is where you lose your rows. I would probably rewrite your second query's where clause as
where cudsubq.new_user_id is null
, assuming that non-matching users have null new_user_id.Read this: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:442029737684
For what I understand, your
cudsubq.new_user_id
can beNULL
even though both tables are joined byuser_id
, so, you won't get results using theNOT IN
operator when the subset containsNULL
values . Consider the example in the article:This returns no records. Try using the
NOT EXISTS
operator or just another kind of join. Here is a good source: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.htmlAnd what you need is the fourth example: