How is this script updating table when using LEFT

2019-01-19 06:05发布

问题:

The result in this script is correct, but I don't seem to understand why column BetaStatus remains 'NOK'

Regarding 'Beta', the first row (Beta = NOK) would update @Summary.BetaStatus to NOK. But then I thought that the last two rows in @testTable would update BetaStatus from NOK back to OK. I just want to be certain that it's not a coincidence that the 'NOK' is actually the last row to be processed, and hence the value.

declare @testTable table 
(
    id int,
    Pgroup varchar(10),
    Pstatus varchar(3)
)
insert into @testTable select 3, 'Alpha', 'OK'
insert into @testTable select 3, 'Beta',  'NOK'
insert into @testTable select 3, 'Gamma', 'OK'
insert into @testTable select 3, 'Beta',  'OK'
insert into @testTable select 3, 'Beta',  'OK'

declare @Summary table
(
    id int,
    AlphaStatus varchar(3),
    BetaStatus varchar(3),
    GammaStatus varchar(3)
)
insert into @Summary (id) select 3

update @Summary 
set
    AlphaStatus = ISNULL(rA.Pstatus, AlphaStatus),
    BetaStatus = ISNULL(rB.Pstatus,  BetaStatus),
    GammaStatus = ISNULL(rG.Pstatus, GammaStatus)
from @Summary t
left join @testTable rA on rA.id = t.ID AND rA.Pgroup = 'Alpha'
left join @testTable rB on rB.id = t.ID AND rB.Pgroup = 'Beta'
left join @testTable rG on rG.id = t.ID AND rG.Pgroup = 'Gamma'

select * from @summary

The reason I ask is that is that, for every id, AlphaStatus, BetaStatus, GammaStatus should not be changed back to 'OK' if it was previously 'NOK'. Once it's updated to 'NOK' it remains that way regardless of what comes next.

The alternative was to update @Summary with the 'OK' values, and then do another update with 'NOK'. That way I know that the 'NOK' will not be replaced. But if this works, then I'd rather use this.

And as a second question, why doesn't the UPDATE work correctly if I use INNER JOIN?

Thanks.

回答1:

Let's see what return select instead of update

select 
     AlphaStatus = ISNULL(rA.Pstatus, AlphaStatus),
     BetaStatus  = ISNULL(rB.Pstatus,  BetaStatus),
     GammaStatus = ISNULL(rG.Pstatus, GammaStatus)
from @Summary t
left join @testTable rA on rA.id = t.ID AND rA.Pgroup = 'Alpha'
left join @testTable rB on rB.id = t.ID AND rB.Pgroup = 'Beta'
left join @testTable rG on rG.id = t.ID AND rG.Pgroup = 'Gamma'

Result:

AlphaStatus     BetaStatus  GammaStatus
OK              NOK         OK
OK              OK          OK
OK              OK          OK

Now you try to do UPDATE

update @Summary 
set
    AlphaStatus = ISNULL(rA.Pstatus, AlphaStatus),
    BetaStatus  = ISNULL(rB.Pstatus,  BetaStatus),
    GammaStatus = ISNULL(rG.Pstatus, GammaStatus)
from @Summary t
left join @testTable rA on rA.id = t.ID AND rA.Pgroup = 'Alpha'
left join @testTable rB on rB.id = t.ID AND rB.Pgroup = 'Beta'
left join @testTable rG on rG.id = t.ID AND rG.Pgroup = 'Gamma'

After update table @Summary contains:

id  AlphaStatus     BetaStatus  GammaStatus
3   OK              NOK         OK

I suppose you wanted to get:

id  AlphaStatus     BetaStatus  GammaStatus
3   OK              OK      OK

But UPDATE doesn't work that way, when they are multiple matches the result may be inconsistent and it is partially based on the table ordering or actual execution plan.

See also: Let's deprecate UPDATE FROM!

Correctness? Bah, who cares?

Well, most do. That’s why we test.

If I mess up the join criteria in a SELECT query so that too many rows from the second table match, I’ll see it as soon as I test, because I get more rows back then expected. If I mess up the subquery criteria in an ANSI standard UPDATE query in a similar way, I see it even sooner, because SQL Server will return an error if the subquery returns more than a single value. But with the proprietary UPDATE FROM syntax, I can mess up the join and never notice – SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, with only the result of the last of those updates sticking. And there is no way of knowing which row that will be, since that depends in the query execution plan that happens to be chosen. A worst case scenario would be one where the execution plan just happens to result in the expected outcome during all tests on the single-processor development server – and then, after deployment to the four-way dual-core production server, our precious data suddenly hits the fan…

Too see this inconsistency instead of table variable use table and create clustered indexes:

SqlFiddleDemo

CREATE TABLE testTable(id int,
    Pgroup varchar(10),
    Pstatus varchar(3));

CREATE CLUSTERED INDEX clx_name
ON  testTable(PStatus DESC);

/* or */
CREATE CLUSTERED INDEX clx_name
ON  testTable(PStatus ASC);  

If you use MERGE for instance:

;WITH cte as
(SELECT 
   ra.id
   ,AlphaStatus = rA.Pstatus
   ,BetaStatus = rB.Pstatus
   ,GammaStatus = rG.Pstatus
from @Summary t
left join @testTable rA on rA.id = t.ID AND rA.Pgroup = 'Alpha'
left join @testTable rB on rB.id = t.ID AND rB.Pgroup = 'Beta'
left join @testTable rG on rG.id = t.ID AND rG.Pgroup = 'Gamma'
)
MERGE @Summary AS TGT
USING (SELECT * FROM cte ) AS  SRC
   ON TGT.id = SRC.id
WHEN MATCHED THEN
   UPDATE
   SET 
    AlphaStatus = ISNULL(src.AlphaStatus, tgt.AlphaStatus),
    BetaStatus  = ISNULL(src.BetaStatus,  tgt.BetaStatus),
    GammaStatus = ISNULL(src.GammaStatus, tgt.GammaStatus);

You will get clear error message that this is not allowed:

The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.



标签: sql tsql