TSQL: Using replace function in select with join

2019-08-24 02:45发布

问题:

Background. I'm using SQL Server. I have two tables in database:

Vendors(Id, Name, Description)
Products(Id, VendorId, Name, Description)

Values in Id column are formatted with prefix 'ID_' in Vendor table.

Values in VendorId column are formatted with prefix 'VE_' in Products table.

E.g. 'VE_001245' in Products refers to 'ID_001245' in Vendors.

(Please, do not propose to change this concept, do not care about database scheme, do not suggest adding foreign key. All it is just for illustration.)

Question: which one of following queries is best in performance context and why?

  1. Use replace function in inner select:

    select v.* from Vendors v
    inner join
    (
        select distinct replace(VendorId, 'VE_', 'ID_') as Id
        from Products
    ) list
    on v.Id = list.Id
    
  2. Use replace function in on-statement:

    select v.* from Vendors v
    inner join
    (
        select distinct VendorId as Id
        from Products
    ) list
    on v.Id = replace(list.Id, 'VE_', 'ID_')
    

Edit. There is only clustered index in each table (by Id column). Each table can contains millions rows.

回答1:

Both the queries are almost same in terms of performance. In the first query sorting is done twice, once when you are selecting the distinct records and again when it is performing an inner join, and in the end a merge join is there to select the final result set. Whereas in second query sorting is done only once but Hash join is being performed which is more expensive then merge join. So both the queries are same performance wise in the scenario when you don't have any index on the table.