I have normalized tables in a database and to denormalize it, I created a view out of two tables. When I tried to create a clustered index on the view, it wouldn't let me, as the view was created with a left outer join. I used a left join because I want the null values to show up in the resulting view, much like how it was suggested in this earlier post.
Question on join where one column one side is null
The table structure and relationship is very much similar to what was described in the above link.
I seemed to hit a wall here as I couldn't convert my left join into an inner join, as that would exclude all records with null values on any of the joined columns. My questions are:
- Why is indexing not allowed on outer or self joins?
- Are there any performance hits on this kind of un-indexed view?
- Anyone knows any workaround to this problem?
I've just finished a SQL Server course yesterday so don't know how to proceed. Would appreciate any comments. Cheers.
There is a "workaround" here that involves check for
NULL
in the join and having aNULL
representation value in the tableNULL value
The join
Logically you are making two separate queries. 'A LEFT JOIN B' is just shorthand for '(A JOIN B) UNION A'
The first query is table A inner joined to table B. This gets an indexed view, since this is where all the heavy lifting is done.
The second query is just table A where any of the join columns are null. Make a view that produces the same output columns as the first query and pads them with nulls.
Just union the two results before returning them. No need for a workaround.
I'll work on an answer to 1, but for now:
[2]. The view will be no more nor less performant than the equivalent query on the udnerlying tables. All the usual advice applies about having covering indexes, preferably an index on the joined columns, etc.
[3]. There's no real workaround. Most of the restrictions on indexed views exist for very good reasons, once you dig into them.
I'd just create the view, generally, and do no more, unless there was a specific performance problem.
I'll try to add an answer for 1 once I've reconstructed it in my own mind.
I don't think there is a good workaround. What you can do about this is to create a real table from the view and set indexes on that. This can be done by a stored procedure that is called regularly when data is updated.
But this is only a noteworthy approach if data isn't updated every few seconds.
Here is an alternative. You want a materialized view of A not containing B. That isn't directly available... so instead, materialize two views. One of all A's and one of only A's with B's. Then, get only A's not having B's by taking A except B. This can be done efficiently:
Create two materialized views (mA and mAB) (edit: mA could just be the base table). mA lacks the join between A and B (thus containing all A's period [and therefore containing those records WITHOUT matches in B]). mAB joins between A and B (thus containing only A's with B's [and therefore excluding those records WITHOUT matches in B]).
To get all A's without matches in B, mask out those that match:
This should yield a left anti semi join against both your clustered indexes to get the ids and a clustered index seek to get the data out of mA you are looking for.
Essentially what you are running into is the basic rule that SQL is much better at dealing with data that IS there than data that ISN'T. By materializing two sources, you gain some compelling set based options. You have to weigh the cost of these views against those gains yourself.