I have a simple table not related to any other. It has a not PK column that it is a date. I have created a non-clustered index to that column. If I make this query:
select * from table where datecolumn is not null <-- does not use the index and goes really slow.
But if I remove the not, this way:
select * from table where datecolum is null <-- uses the index and goes really fast.
There are much more not nulls than nulls.
Am I forgetting something? Could I use filtered index here?
Thanks in advance.
This is normal. It won't use the index unless the predicate is selective enough to warrant it.
It sounds like the vast majority of records are not NULL so instead of finding these via the non clustered index then having to do loads of bookmark lookups and random I/O to retrieve the rest of the columns to return it is quicker and more efficient to just scan the whole clustered index.
You can use FORCESEEK
to force the behaviour that you say you want. You will likely find that the time taken and I/O stats go through the roof compared to the clustered index scan.
SET STATISTICS IO ON
SELECT * FROM
YourTable WITH (FORCESEEK)
WHERE YourCol IS NOT NULL
The key to understand your issue is probably in this sentence: there are much more not nulls than nulls.
SQLServer (and any other relational db for that matter) uses statistics to determine what's the query plan it is going to use. Statistic are probably telling the database that there are a lot of rows with non null dates. So maybe SQLServer is thinking that using an index it's not worth it and a FULL TABLE SCAN is a better plan for that specific query with that NOT NULL condition.
Two things I want to mention:
- A query using is not always faster than a query not using one.
- You can place an INDEX HINT on your query but I usually find shooting myself in the foot doing that kind of optimization.