What to replace left join in a view so i can have

2020-02-26 04:40发布

I have normalized tables in a database and to denormalize it, I created a view out of two tables. When I tried to create a clustered index on the view, it wouldn't let me, as the view was created with a left outer join. I used a left join because I want the null values to show up in the resulting view, much like how it was suggested in this earlier post.

Question on join where one column one side is null

The table structure and relationship is very much similar to what was described in the above link.

I seemed to hit a wall here as I couldn't convert my left join into an inner join, as that would exclude all records with null values on any of the joined columns. My questions are:

  1. Why is indexing not allowed on outer or self joins?
  2. Are there any performance hits on this kind of un-indexed view?
  3. Anyone knows any workaround to this problem?

I've just finished a SQL Server course yesterday so don't know how to proceed. Would appreciate any comments. Cheers.

5条回答
贪生不怕死
2楼-- · 2020-02-26 04:50

There is a "workaround" here that involves check for NULL in the join and having a NULL representation value in the table

NULL value

INSERT INTO Father (Father_id, Father_name) values(-255,'No father')

The join

JOIN [dbo].[son] s on isnull(s.father_id, -255) = f.father_id
查看更多
做自己的国王
3楼-- · 2020-02-26 05:05

Logically you are making two separate queries. 'A LEFT JOIN B' is just shorthand for '(A JOIN B) UNION A'

The first query is table A inner joined to table B. This gets an indexed view, since this is where all the heavy lifting is done.

The second query is just table A where any of the join columns are null. Make a view that produces the same output columns as the first query and pads them with nulls.

Just union the two results before returning them. No need for a workaround.

查看更多
何必那么认真
4楼-- · 2020-02-26 05:06

I'll work on an answer to 1, but for now:

[2]. The view will be no more nor less performant than the equivalent query on the udnerlying tables. All the usual advice applies about having covering indexes, preferably an index on the joined columns, etc.

[3]. There's no real workaround. Most of the restrictions on indexed views exist for very good reasons, once you dig into them.

I'd just create the view, generally, and do no more, unless there was a specific performance problem.

I'll try to add an answer for 1 once I've reconstructed it in my own mind.

查看更多
Lonely孤独者°
5楼-- · 2020-02-26 05:07

I don't think there is a good workaround. What you can do about this is to create a real table from the view and set indexes on that. This can be done by a stored procedure that is called regularly when data is updated.

Select * 
into <REAL_TABLE>
From <VIEW>

create CLUSTERED index <INDEX_THE_FIELD> on <REAL_TABLE>(<THE_FIELD>)

But this is only a noteworthy approach if data isn't updated every few seconds.

查看更多
我只想做你的唯一
6楼-- · 2020-02-26 05:08

Here is an alternative. You want a materialized view of A not containing B. That isn't directly available... so instead, materialize two views. One of all A's and one of only A's with B's. Then, get only A's not having B's by taking A except B. This can be done efficiently:

Create two materialized views (mA and mAB) (edit: mA could just be the base table). mA lacks the join between A and B (thus containing all A's period [and therefore containing those records WITHOUT matches in B]). mAB joins between A and B (thus containing only A's with B's [and therefore excluding those records WITHOUT matches in B]).

To get all A's without matches in B, mask out those that match:

with ids as (
  select matchId from mA with (index (pk_matchid), noexpand)
  except
  select matchId from mAB with (index (pk_matchid), noexpand)
)
select * from mA a join ids b on a.matchId = b.matchId;

This should yield a left anti semi join against both your clustered indexes to get the ids and a clustered index seek to get the data out of mA you are looking for.

Essentially what you are running into is the basic rule that SQL is much better at dealing with data that IS there than data that ISN'T. By materializing two sources, you gain some compelling set based options. You have to weigh the cost of these views against those gains yourself.

查看更多
登录 后发表回答