How to write a query using self join on a table wi

2019-09-08 07:31发布

问题:

With the following table structure:

Table 1: Customers

CustID(primary key)  |  CustName(indexed)
----------------------------------
C1                       Cust1
C2                       Cust2
.                        Cust3
.                        Cust.
.                        Cust.
C10000                   Cust10000

Table 2: CustomFields

FieldID (primary key) |  ID (indexed)                | FieldValue
---------------------------------------------------------------------
1                        C1                              Test 
2                        C2                              Test 
3                        C3                              Test 
4                        C4                              Test 
.                        .                               Test 
.                        .                               Test 
few millions             Z1                              Test   

"ID" column is indexed.

Trying to output the following;

CustID  |  Field 1  |  Field 2 | Field 3 | .... | Field N
----------------------------------------------------------

I tried writing a query like

Select 
    CustID, A1.FieldValue as [Field 1], A2.FieldValue as [Field 2]
from 
    Customers 
left outer join 
    CustomFields A1 on Customers.custID = A1.ID
left outer join 
    CustomFields A2 on Customers.custID = A2.ID
left outer join 
    CustomFields An on Customers.custID = An.ID
where 
    custName like 'C%'

Since the CustomFields table holds a few million records, the above query does not perform well. Now it takes about 10-12 seconds (for 500 customers and 6 fields)

I think the left outer joins are adding time here. Any thoughts to solve the problem would be really helpful?

Platform : SQL Server 2005

Updated :

CustomFields table is a generic table and it can contain fields of any other entities (vendors, items etc..).

回答1:

You can use a single join to get the customfields, and use aggregate functions to transpose them. That way, the join with table CustomFields is only done once.

Like this.

SELECT
  Customers.CustID,
  MAX(CASE WHEN CF.FieldID = 1 THEN CF.FieldValue ELSE NULL END) AS Field1,
  MAX(CASE WHEN CF.FieldID = 2 THEN CF.FieldValue ELSE NULL END) AS Field2,
  MAX(CASE WHEN CF.FieldID = 3 THEN CF.FieldValue ELSE NULL END) AS Field3,
  MAX(CASE WHEN CF.FieldID = 4 THEN CF.FieldValue ELSE NULL END) AS Field4
  -- Add more...

  FROM Customers 

  LEFT OUTER JOIN CustomFields CF
  ON CF.ID = Customers.CustID

  WHERE Customers.CustName like 'C%'

  GROUP BY Customers.CustID


回答2:

Have you tried analyzing the query with Database Engine Tuning Advisor? No guarantees, but it might provide a useful suggestion or two.



回答3:

You query is totally reasonable. However, the database structure is off. You should have a separate column for the customer id and for the field id. The way the query is written, you will get as many rows for the CustId as you have fields in the other table.

They shouldn't be combined into a single column. I would expect the join to look like:

left outer join
CustomerFields cfn
on cfn.CustId = Customers.CustId and
   cfn.FieldNum = <n>

Another possibility is that the performance hit is the first time you run the query. If you have enough memory, then it might run much faster the second time -- because the fields table is in memory.

For the record, I have a historical bias against "like", believe that it causes performance problems. This may not help, but you can try:

where left(CustName, 1) = 'C'


回答4:

    SELECT  Pvt.ID, 
            Customers.CustName, 
            Pvt.[1] AS Field1, 
            Pvt.[2] AS Field2, 
            Pvt.[3] AS Field3, 
            Pvt.[4] AS Field4, 
            Pvt.[5] AS Field5,
            ...
    FROM (
            SELECT ID, FieldID, FieldValue
            FROM CustomFields) AS p
            PIVOT (
                MIN (FieldValue)
                FOR FieldID IN ([1], [2], [3], [4], [5], ... )
        ) AS pvt
    inner join Customers ON Customers.CustID = pvt.ID