What is the main purpose of using CROSS APPLY?
I have read (vaguely, through posts on the Internet) that cross apply
can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)
I also know that CROSS APPLY
doesn't require a UDF as the right-table.
In most INNER JOIN
queries (one-to-many relationships), I could rewrite them to use CROSS APPLY
, but they always give me equivalent execution plans.
Can anyone give me a good example of when CROSS APPLY
makes a difference in those cases where INNER JOIN
will work as well?
Edit:
Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where cross apply
is faster/more efficient)
create table Company (
companyId int identity(1,1)
, companyName varchar(100)
, zipcode varchar(10)
, constraint PK_Company primary key (companyId)
)
GO
create table Person (
personId int identity(1,1)
, personName varchar(100)
, companyId int
, constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
, constraint PK_Person primary key (personId)
)
GO
insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'
insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3
/* using CROSS APPLY */
select *
from Person p
cross apply (
select *
from Company c
where p.companyid = c.companyId
) Czip
/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId
I guess it should be readability ;)
CROSS APPLY will be somewhat unique for people reading to tell them that a UDF is being used which will be applied to each row from the table on the left.
Ofcourse, there are other limitations where a CROSS APPLY is better used than JOIN which other friends have posted above.
Cross apply works well with an XML field as well. If you wish to select node values in combination with other fields.
For example, if you have a table containing some xml
Using the query
Will return a result
See the article in my blog for detailed performance comparison:
INNER JOIN
vs.CROSS APPLY
CROSS APPLY
works better on things that have no simpleJOIN
condition.This one selects
3
last records fromt2
for each record fromt1
:It cannot be easily formulated with an
INNER JOIN
condition.You could probably do something like that using
CTE
's and window function:, but this is less readable and probably less efficient.
Update:
Just checked.
master
is a table of about20,000,000
records with aPRIMARY KEY
onid
.This query:
runs for almost
30
seconds, while this one:is instant.
cross apply
sometimes enables you to do things that you cannot do withinner join
.Example (a syntax error):
This is a syntax error, because, when used with
inner join
, table functions can only take variables or constants as parameters. (I.e., the table function parameter cannot depend on another table's column.)However:
This is legal.
Edit: Or alternatively, shorter syntax: (by ErikE)
Edit:
Note: Informix 12.10 xC2+ has Lateral Derived Tables and Postgresql (9.3+) has Lateral Subqueries which can be used to a similar effect.
It seems to me that CROSS APPLY can fill a certain gap when working with calculated fields in complex/nested queries, and make them simpler and more readable.
Simple example: you have a DoB and you want to present multiple age-related fields that will also rely on other data sources (such as employment), like Age, AgeGroup, AgeAtHiring, MinimumRetirementDate, etc. for use in your end-user application (Excel PivotTables, for example).
Options are limited and rarely elegant:
JOIN subqueries cannot introduce new values in the dataset based on data in the parent query (it must stand on its own).
UDFs are neat, but slow as they tend to prevent parallel operations. And being a separate entity can be a good (less code) or a bad (where is the code) thing.
Junction tables. Sometimes they can work, but soon enough you're joining subqueries with tons of UNIONs. Big mess.
Create yet another single-purpose view, assuming your calculations don't require data obtained mid-way through your main query.
Intermediary tables. Yes... that usually works, and often a good option as they can be indexed and fast, but performance can also drop due to to UPDATE statements not being parallel and not allowing to cascade formulas (reuse results) to update several fields within the same statement. And sometimes you'd just prefer to do things in one pass.
Nesting queries. Yes at any point you can put parenthesis on your entire query and use it as a subquery upon which you can manipulate source data and calculated fields alike. But you can only do this so much before it gets ugly. Very ugly.
Repeating code. What is the greatest value of 3 long (CASE...ELSE...END) statements? That's gonna be readable!
Did I miss something? Probably, so feel free to comment. But hey, CROSS APPLY is like a godsend in such situations: you just add a simple
CROSS APPLY (select tbl.value + 1 as someFormula) as crossTbl
and voilà! Your new field is now ready for use practically like it had always been there in your source data.Values introduced through CROSS APPLY can...
CROSS APPLY (select crossTbl.someFormula + 1 as someMoreFormula) as crossTbl2
Dang, there's nothing they can't do!
Consider you have two tables.
MASTER TABLE
DETAILS TABLE
There are many situations where we need to replace
INNER JOIN
withCROSS APPLY
.1. Join two tables based on
TOP n
resultsConsider if we need to select
Id
andName
fromMaster
and last two dates for eachId
fromDetails table
.The above query generates the following result.
See, it generated results for last two dates with last two date's
Id
and then joined these records only in the outer query onId
, which is wrong. To accomplish this, we need to useCROSS APPLY
.and forms the following result.
Here's how it works. The query inside
CROSS APPLY
can reference the outer table, whereINNER JOIN
cannot do this (it throws compile error). When finding the last two dates, joining is done insideCROSS APPLY
i.e.,WHERE M.ID=D.ID
.2. When we need
INNER JOIN
functionality using functions.CROSS APPLY
can be used as a replacement withINNER JOIN
when we need to get result fromMaster
table and afunction
.And here is the function
which generated the following result
ADDITIONAL ADVANTAGE OF CROSS APPLY
APPLY
can be used as a replacement forUNPIVOT
. EitherCROSS APPLY
orOUTER APPLY
can be used here, which are interchangeable.Consider you have the below table(named
MYTABLE
).The query is below.
which brings you the result