I have few questions on SQL..
How to analyze the performance of a
query? Any software, inbuilt
features of MSSQL server 2005/2008?
What should be used in place of in
in queries so that the performance is better?
Eg:
SELECT *
FROM enquiry_courses
WHERE
enquiry_id IN
( SELECT enquiry_id FROM enquiries WHERE session_id = '4cd3420a16dbd61c6af58f6199ac00f1' )
Which is better: JOINS
, EXISTS
or IN
in terms of performance?
Comments/Help appreciated...
MSSQL generally comes with a built in gui tool called Query Analyser which describes how the query will be executed.
For 2) you could rewrite as:
SELECT *
FROM enquiry_courses ec
WHERE EXISTS (select 1 FROM enquiries e
WHERE e.enquiry_id = ec.enquiry_id
and e.session_id ='4cd3420a16dbd61c6af58f6199ac00f1' )
but I can't believe it would make any performance difference in a modern RDBMS.
3: I would expect an IN or EXIST clause to be flattened to a JOIN by the database engine, so there shouldn't be a difference in performance. I don't know about SQL Server, but in Oracle you can verify this by checking the execution plan.
This question suggests that EXISTS is quicker which is what I had been taught IN () vs EXISTS () in SqlServer 2005 (or generally in any RDBMS)
One thing to note is that EXISTS and IN should be used in preference to NOT EXISTS and NOT IN
Bit of a tangent from performance but this is a good article on the subtle differences between IN and EXISTS http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
I guess the join gives more free to the engine for choice the best query plan.
In your exactly case, probably have all solutions similar performances.
SELECT enquiry_courses.*
FROM enquiry_courses
INNER JOIN enquiries ON enquiries.enquiry_id=enquiry_courses
AND session_id = '4cd3420a16dbd61c6af58f6199ac00f1'
They each behave differently: it is not a performance choice
The only correct and reliable choice is EXISTS or NOT EXISTS that works all the time.
- JOIN may needs DISTINCT
- WHERE/LEFT JOIN would needs correct placement of the filter
- NOT IN fails on NULL
Example:
DECLARE @Parent TABLE (foo int NULL)
INSERT @Parent (foo) VALUES (1)
INSERT @Parent (foo) VALUES (2)
INSERT @Parent (foo) VALUES (3)
INSERT @Parent (foo) VALUES (4)
DECLARE @Child TABLE (bar int NULL, foo int NULL)
INSERT @Child (bar, foo) VALUES (100, 1)
INSERT @Child (bar, foo) VALUES (200, 2)
INSERT @Child (bar, foo) VALUES (201, 2)
INSERT @Child (bar, foo) VALUES (300, NULL)
INSERT @Child (bar, foo) VALUES (301, NULL)
INSERT @Child (bar, foo) VALUES (400, 4)
INSERT @Child (bar, foo) VALUES (500, NULL)
--"positive" checks
SELECT -- multiple "2" = FAIL without DISTINCT
P.*
FROM
@Parent P JOIN @Child C ON P.foo = C.foo
SELECT -- correct
P.*
FROM
@Parent P
WHERE
P.foo IN (SELECT c.foo FROM @Child C)
SELECT -- correct
P.*
FROM
@Parent P
WHERE
EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
--"negative" checks
SELECT -- correct
P.*
FROM
@Parent P LEFT JOIN @Child C ON P.foo = C.foo
WHERE
C.foo IS NULL
SELECT -- no rows = FAIL
P.*
FROM
@Parent P
WHERE
P.foo NOT IN (SELECT c.foo FROM @Child C)
SELECT -- correct
P.*
FROM
@Parent P
WHERE
NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
Note: with EXISTS, the SELECT in the subquery is irrelevant as mentioned in ANSI 92 standard...
NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT NULL FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1 FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1/0 FROM @Child C WHERE P.foo = C.foo)
- check the Excution Plan
- You can optimise your query by:
- Make a "arguments search" rather than IN
- Put Index on session_id
SELECT * FROM enquiry_courses as Courses, enquiries as Enquiries
WHERE Enquiries.session_id = '4cd3420a16dbd61c6af58f6199ac00f1'
AND Courses.enquiry_id = Enquiries.enquiry_id
3.Exists is better for performance.
EDIT: Exists & IN are better than JOIN for performance issues.
EDIT: I re-wrote the query so that it's faster (I put the most restrictive condition first in the WHERE close)