I have the following table with two fields namely a and b as shown below:
create table employe
(
empID varchar(10),
department varchar(10)
);
Inserting some records:
insert into employe values('A101','Z'),('A101','X'),('A101','Y'),('A102','Z'),('A102','X'),
('A103','Z'),('A103','Y'),('A104','X'),('A104','Y'),('A105','Z'),('A106','X');
select * from employe;
empID department
------------------
A101 Z
A101 X
A101 Y
A102 Z
A102 X
A103 Z
A103 Y
A104 X
A104 Y
A105 Z
A106 X
Note: Now I want to show the employee who is only and only belongs to the department Z
and Y
.
So according to the condition the only employee A103
should be displayed because of he only belongs
to the department Z
and Y
. But employee A101
should not appear because he belong to Z,X, and Y
.
Expected Result:
If condition is : Z
and Y
then result should be:
empID
------
A103
If condition is : Z
and X
then result should be:
empID
------
A102
If condition is : Z
,X
and Y
then result should be:
empID
------
A101
Note: I want to do it in the where
clause only (don't want to use the group by
and having
clauses), because I'm going to include this one in the other where
also.
This is a Relational Division with no Remainder (RDNR) problem. See this article by Dwain Camps that provides many solution to this kind of problem.
First Solution
SQL Fiddle
Second Solution
SQL Fiddle
Without using
GROUP BY
andHAVING
:for If condition is : Z and Y
In Postgres this can be simplified using arrays:
It's important to sort the elements in the
array_agg()
and compare them to a sorted list of departments in the same order. Otherwise this won't return correct answers.E.g.
array_agg(department) = array['Z', 'Y']
might potentially return wrong results.This can be done in a more flexible manner using a CTE to supply the departments:
That way the sorting of the elements is always done by the database and will be consistent between the values in the aggregated array and the one to which it is compared.
An option with standard SQL is to check if at least one row has a different department together with counting all rows
The above solution won't work if it's possible that a single employee is assigned twice to the same department!
The
having min (...)
can be simplified in Postgres using the aggregatebool_and()
.When applying the standard
filter()
condition to do conditional aggregation this can also be made to work with situation where an employee can be assigned to the same department twicebool_and(department in ('Y','Z'))
only returns true if the condition is true for all rows in the group.Another solution with standard SQL is to use the intersection between those employees that have at least those two departments and those that are assigned to exactly two departments:
The following query works when you want employees from departments 'Y' and 'Z' and not 'X'.
For your second case, simply replace
not in
within
in the last condition.You can too use
GROUP BY
andHAVING
— you just need to do it in a subquery.For example, let's start with a simple query to find all employees in departments X and Y (and not in any other departments):
I've used MySQL's
GROUP_CONCAT()
function as a convenient shortcut here, but you could get the same results without it, too, e.g. like this:Now, to combine this with other query condition, simply take a query that includes the other conditions, and join your employees table against the output of the query above:
Here's an SQLFiddle demonstrating this query.
Ps. The reason why you should use a
JOIN
instead of anIN
subquery for this is because MySQL is not so good at optimizingIN
subqueries.Specifically (as of v5.7, at least), MySQL always converts
IN
subqueries into dependent subqueries, so that the subquery must be re-executed for every row of the outer query, even if the original subquery was independent. For example, the following query (from the documentation linked above):gets effectively converted into:
This may still be reasonably fast, if
t2
is small and/or has an index allowing fast lookups. However, if (like in the original example above) executing the subquery might take a lot of work, the performance can suffer badly. Using aJOIN
instead allows the subquery to only be executed once, and thus typically offers much better performance.What about a self join? (ANSI Compliant - worked for 20+ years)
This shows that a101 and a104 both work in both departments.