Can someone please explain what the partition by
keyword does and give a simple example of it in action, as well as why one would want to use it? I have a SQL query written by someone else and I'm trying to figure out what it does.
An example of partition by:
SELECT empno, deptno, COUNT(*)
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp
The examples I've seen online seem a bit too in-depth.
The PARTITION BY
clause sets the range of records that will be used for each "GROUP" within the OVER
clause.
In your example SQL, DEPT_COUNT
will return the number of employees within that department for every employee record. (It is as if your de-nomalising the emp
table; you still return every record in the emp
table.)
emp_no dept_no DEPT_COUNT
1 10 3
2 10 3
3 10 3 <- three because there are three "dept_no = 10" records
4 20 2
5 20 2 <- two because there are two "dept_no = 20" records
If there was another column (e.g., state
) then you could count how many departments in that State.
It is like getting the results of a GROUP BY
(SUM
, AVG
, etc.) without the aggregation of the result set.
It is useful when you use the LAST OVER
or MIN OVER
functions to get, for example, the lowest and highest salary in the department and then use that in a calulation against this records salary without a sub select, which is much faster.
Read the linked AskTom article for further details.
The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:
1) Boss says "get me number of items we have in stock grouped by brand"
You say: "no problem"
SELECT
BRAND
,COUNT(ITEM_ID)
FROM
ITEMS
GROUP BY
BRAND;
Result:
+--------------+---------------+
| Brand | Count |
+--------------+---------------+
| H&M | 50 |
+--------------+---------------+
| Hugo Boss | 100 |
+--------------+---------------+
| No brand | 22 |
+--------------+---------------+
2) The boss says "Now get me a list of all items, with their brand AND number of items that the respective brand has"
You may try:
SELECT
ITEM_NR
,BRAND
,COUNT(ITEM_ID)
FROM
ITEMS
GROUP BY
BRAND;
But you get:
ORA-00979: not a GROUP BY expression
This is where the OVER (PARTITION BY BRAND)
comes in:
SELECT
ITEM_NR
,BRAND
,COUNT(ITEM_ID) OVER (PARTITION BY BRAND)
FROM
ITEMS;
Whic means:
COUNT(ITEM_ID)
- get the number of items
OVER
- Over the set of rows
(PARTITION BY BRAND)
- that have the same brand
And the result is:
+--------------+---------------+----------+
| Items | Brand | Count() |
+--------------+---------------+----------+
| Item 1 | Hugo Boss | 100 |
+--------------+---------------+----------+
| Item 2 | Hugo Boss | 100 |
+--------------+---------------+----------+
| Item 3 | No brand | 22 |
+--------------+---------------+----------+
| Item 4 | No brand | 22 |
+--------------+---------------+----------+
| Item 5 | H&M | 50 |
+--------------+---------------+----------+
etc...
It is the SQL extension called analytics. The "over" in the select statement tells oracle that the function is a analytical function, not a group by function. The advantage to using analytics is that you can collect sums, counts, and a lot more with just one pass through of the data instead of looping through the data with sub selects or worse, PL/SQL.
It does look confusing at first but this will be second nature quickly. No one explains it better then Tom Kyte. So the link above is great.
Of course, reading the documentation is a must.
EMPNO DEPTNO DEPT_COUNT
7839 10 4
5555 10 4
7934 10 4
7782 10 4 --- 4 records in table for dept 10
7902 20 4
7566 20 4
7876 20 4
7369 20 4 --- 4 records in table for dept 20
7900 30 6
7844 30 6
7654 30 6
7521 30 6
7499 30 6
7698 30 6 --- 6 records in table for dept 30
Here we are getting count for respective deptno.
As for deptno 10 we have 4 records in table emp similar results for deptno 20 and 30 also.
the over partition keyword is as if we are partitioning the data by client_id
creation a subset of each client id
select client_id, operation_date,
row_number() count(*) over (partition by client_id order by client_id ) as operationctrbyclient
from client_operations e
order by e.client_id;
this query will return the number of operations done by the client_id