I have a script that has to look throught over 2.5 million records to find if a member that has an unread email. I want to know what can be done to improve its speed. Currently it can take up to 8 seconds to run the script:
SELECT TOP(1) MemberMailID
FROM MemberMail
WHERE ToReadFlag = 0
AND ToMemberID = 102
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND OnHold = 0
AND ToArchivedFlag = 0
How could I make it faster using indexes?
This index will probably be helpful, but keep in mind that there is no free lunch (indexes have to be maintained, so this will affect your insert/update/delete workload):
CREATE NONCLUSTERED INDEX unread_emails
ON dbo.MemberMail(ToMemberID)
INCLUDE (MemberMailID)
WHERE ToReadFlag = 0
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND OnHold = 0
AND ToArchivedFlag = 0;
Now your query can say:
SELECT TOP (1) MemberMailID
FROM dbo.MemberMail -- dbo prefix
WITH (INDEX (unread_emails)) -- in case you need to force, though you should not
WHERE ToMemberID = 102
AND ToReadFlag = 0
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND OnHold = 0
AND ToArchivedFlag = 0
ORDER BY ToMemberID; -- ORDER BY is important!
If you change the values of some of these flags depending on the query, you may experiment with adding those columns to the key of the index instead of the filter, e.g. let's say sometimes you check for OnHold = 0
and sometimes OnHold = 1
:
CREATE NONCLUSTERED INDEX unread_emails
ON dbo.MemberMail(ToMemberID, OnHold)
INCLUDE (MemberMailID)
WHERE ToReadFlag = 0
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND ToArchivedFlag = 0;
You may also want to experiment with having MemberMailID
in the key instead of the INCLUDE
. e.g.:
CREATE NONCLUSTERED INDEX unread_emails
ON dbo.MemberMail(ToMemberID, MemberMailID)
WHERE ToReadFlag = 0
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND OnHold = 0
AND ToArchivedFlag = 0;
These differences may not matter for your data and usage patterns, but you'll be able to test differences easier than we'll be able to guess.
Looks like a good candidate for a filtered index.
A filtered index is an optimized nonclustered index, especially suited
to cover queries that select from a well-defined subset of data. It
uses a filter predicate to index a portion of rows in the table. A
well-designed filtered index can improve query performance, reduce
index maintenance costs, and reduce index storage costs compared with
full-table indexes.
Something along these lines:
CREATE NONCLUSTERED INDEX IX_MemberMail_ToMemberId_Unread
ON dbo.MemberMail (ToMemberId ASC)
WHERE ToReadFlag = 0
AND ToDeletedFlag = 0
AND FromDeletedFlag = 0
AND OnHold = 0
AND ToArchivedFlag = 0;
Get the Actual Query Plan by (CTRL+M) in SSMS. Or paste your query in SSMS, right click it, select Analyze query in database engine tuning advisor, your will see what index you need to add. Basically, you need an composite+include index.
As a thumb rule, every field on which you perform frequent filters (where
conditions) must be indexed.
Again, as a thumb rule, I follow these criteria:
- Every key fields (primary or foreign) must be indexed
- Every
date
field on which I have to perform frequent lookups must be indexed
- Although I avoid it, if I need to perform frequent searches on
char
or varchar
fields, I index them as well
Notice that it is easy to fall in the temptation of indexing everything. Don't do it. Be careful and design your indexes with the best cost - benefit relationship.
I'm a MySQL user, and I don't how to do it in SQL server, but there must be a way to show the execution plan of your query (in MySQL it is explain select...
). Try to show the execution plan, and then decide on that basis which are the fields you need to index.