Optimize performance of sub-queries

2019-09-09 23:05发布

My first query from table takes about 40 seconds and creates over 80,000 rows. I want to get the counts of Windows 7 applications by Site, Sequence, Total and any OS version.

These sub-queries work, but of course they slow the process down considerably. It took 3.5 hours to run.

Is there a more efficient way to do this?

Output:

SoftwareName    Sequence    Site    Win7/site       Win7Installs/seq    TotWin7apps TotalInstalls
Adobe Acrobat       1       BKN         1                   5               626         7854
AutoCAD LT          1       BKN         1                   1               3           15
Adobe Acrobat       1       CTW         4                   5               626         7854
Adobe Captivate     1       CTW         1                   1               8           60

Query:

WITH PCapps AS (
SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
FROM table
)

SELECT DISTINCT
    SoftwareName,
    Sequence,
    Site,
    (SELECT COUNT(p1.SoftwareName) FROM PCapps p1 WHERE p1.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise' AND p1.Site = pc.Site) as 'Win7/site',
    (SELECT COUNT(p1.SoftwareName) FROM PCapps p1 WHERE p1.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise' AND p1.Sequence = pc.Sequence) as 'Win7Installs/seq',
    (SELECT COUNT(p2.SoftwareName) FROM PCapps p2 WHERE p2.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise') as TotWin7apps,
    (SELECT COUNT(p3.SoftwareName) FROM PCapps p3 WHERE p3.SoftwareName = pc.SoftwareName) as TotalInstalls 
    FROM PCapps pc

** UPDATE:

Using the tips from @Jason Carter, I created a few #temp tables and join them. The result runs very fast in less than a minute (not much longer than the initial query). This method is slightly different than @JasonCarter's solution as I was following his initial tip to create #temp tables. I created several #temp tables, each including the COUNT()s.

SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
INTO #PCapps
      FROM TABLE    

SELECT
    SoftwareName,
    Site,
    COUNT(SoftwareName) AS [SiteInstalls]
INTO #SiteInstalls
FROM #PCapps 
    WHERE OS = 'Windows 7 Enterprise'
    GROUP BY Site, SoftwareName

SELECT
    SoftwareName,
    Sequence,
    COUNT(SoftwareName) AS [SeqInstalls]
INTO #SeqInstalls
FROM #PCapps
    WHERE OS = 'Windows 7 Enterprise'
    GROUP BY Sequence, SoftwareName

SELECT
    SoftwareName,
    COUNT(SoftwareName) AS [Win7Installs]
INTO #Win7Installs
FROM #PCapps
    WHERE OS = 'Windows 7 Enterprise'
    GROUP BY SoftwareName

SELECT
    SoftwareName,
    COUNT(SoftwareName) AS [AppInstalls]
INTO #AppInstalls
FROM #PCapps
    GROUP BY SoftwareName   

SELECT
    pc.SoftwareName,
    pc.Sequence,
    pc.Site,
    sit7.SiteInstalls,
    seq7.SeqInstalls,
    w7.Win7Installs,
    ai.AppInstalls
    FROM #PCapps pc
    LEFT OUTER JOIN #SiteInstalls sit7 ON sit7.SoftwareName = pc.SoftwareName AND sit7.Site = pc.Site
    LEFT OUTER JOIN #SeqInstalls  seq7 ON seq7.SoftwareName = pc.SoftwareName AND seq7.Sequence = pc.Sequence
    LEFT OUTER JOIN #Win7Installs w7   ON   w7.SoftwareName = pc.SoftwareName
    LEFT OUTER JOIN #AppInstalls ai    ON   ai.Softwarename = pc.SoftwareName

DROP TABLE #PCapps
DROP TABLE #SiteInstalls
DROP TABLE #SeqInstalls
DROP TABLE #Win7Installs
DROP TABLE #AppInstalls

2条回答
相关推荐>>
2楼-- · 2019-09-09 23:36

Really surprising to me... The optimizer takes advantage of parallelism to make the subqueries very efficient. I populated a dummy table with 120K records and tried the below. The third query is only slightly more efficient than the first (yours), but at the cost of more complexity. I would leave yours as-is. There probably is a better solution, but yours looks good enough to me. How long does yours take to process the 80K rows?

WITH PCapps2 AS (
SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
FROM pcapps
)
SELECT DISTINCT
    SoftwareName,
    Sequence,
    Site,
    (SELECT COUNT(p1.SoftwareName) FROM PCapps2 p1 WHERE p1.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise' AND p1.Site = pc.Site) as 'Win7/site',
    (SELECT COUNT(p1.SoftwareName) FROM PCapps2 p1 WHERE p1.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise' AND p1.Sequence = pc.Sequence) as 'Win7Installs/seq',
    (SELECT COUNT(p2.SoftwareName) FROM PCapps2 p2 WHERE p2.SoftwareName = pc.Softwarename AND OS = 'Windows 7 Enterprise') as TotWin7apps,
    (SELECT COUNT(p3.SoftwareName) FROM PCapps2 p3 WHERE p3.SoftwareName = pc.SoftwareName) as TotalInstalls 
    FROM PCapps2 pc

;WITH PCapps2 AS (
SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
FROM pcapps
)
SELECT DISTINCT
    SoftwareName,
    Sequence,
    Site,
    COUNT(case when os = 'Windows 7 Enterprise' then 1 end) over (partition by pc.Softwarename,pc.Site) as 'Win7/site',
    COUNT(case when os = 'Windows 7 Enterprise' then 1 end) over (partition by pc.Softwarename,pc.Sequence) as 'Win7Installs/seq',
    COUNT(case when os = 'Windows 7 Enterprise' then 1 end) over (partition by pc.Softwarename) as TotWin7apps,
    COUNT(*) over (partition by pc.Softwarename) as TotalInstalls
    FROM PCapps2 pc

;WITH PCapps2 AS (
SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
FROM pcapps
), a as (
    select softwarename, site, COUNT(*) as 'Win7/site'
    from pcapps2
    where os = 'Windows 7 Enterprise'
    group by softwarename, site
), b as (
    select softwarename, sequence, COUNT(*) as 'Win7Installs/seq'
    from pcapps2
    where os = 'Windows 7 Enterprise'
    group by softwarename, sequence
), c as (
    select softwarename, COUNT(case when os = 'Windows 7 Enterprise' then 1 end) as TotWin7apps, COUNT(*) as TotalInstalls
    from pcapps2
    group by softwarename
), d as (
    select distinct softwarename, sequence, site
    from    pcapps2
)
select d.*, isnull(a.[Win7/site], 0) as [Win7/site], isnull(b.[Win7Installs/seq],0) as [Win7Installs/seq] 
        , isnull(c.TotWin7apps,0) as TotWin7apps, isnull(c.TotalInstalls, 0) as TotalInstalls  
from    d
        left join a on d.softwarename = a.softwarename and d.site = a.site 
        left join b on d.softwarename = b.softwarename and d.sequence = b.sequence 
        left join c on d.softwarename = c.softwarename 

First query: Table 'pcapps'. Scan count 15, logical reads 6630, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Second query: Table 'pcapps'. Scan count 3, logical reads 1326, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 18, logical reads 1983591, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Third query: Table 'pcapps'. Scan count 12, logical reads 5304, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

查看更多
Ridiculous、
3楼-- · 2019-09-09 23:48

I would try running the subqueries first into a temp table to gather your counts, then pull your total counts. With the way you have the query setup its going to run each of those subqueries once for every line of PCapps, which is why it is taking so long.

Try something like this:

declare @t table (Computer varchar(10), Sequence int, Site varchar(3), SoftwareName varchar(20), OS varchar(20) )

insert into @t values
 ('C1',1,'BKN','Adobe Acrobat','Win7')
,('C2',1,'BKN','Adobe Acrobat','Win7')
,('C3',1,'BKN','Adobe Acrobat','Win7')
,('C1',1,'BKN','AutoCAD LT ','Win7')
,('C3',1,'BKN','AutoCAD LT ','Win7')
,('B1',2,'CTW','Adobe Acrobat','Win7')
,('B2',2,'CTW','Adobe Acrobat','Win7')
,('B3',3,'CTW','Adobe LT','Win7')
,('B4',3,'CTW','Adobe Acrobat','Win7')
,('A1',2,'CTW','Adobe Acrobat','Win7')
,('A2',2,'CTW','Adobe LT','Win7')
,('A3',2,'CTW','Adobe Acrobat','Win7')
,('X4',3,'CTW','Adobe Acrobat','WinXP')
,('X1',2,'CTW','Adobe Acrobat','WinXP')
,('X2',2,'CTW','Adobe LT','WinXP')
,('X3',2,'CTW','Adobe Acrobat','WinXP')
,('A4',2,'CTW','Adobe Acrobat','Win7');

SELECT 
    Site,
    OS,
    SoftwareName, 
    Sequence,
    COUNT(SoftwareName)  as 'Count'
into #SiteSpecific
FROM @t
group by Site, OS, SoftwareName, Sequence;


WITH PCapps AS (
SELECT DISTINCT
    Computer,
    Sequence,
    Site,
    SoftwareName,
    OS
FROM @t
)
SELECT 
    DISTINCT
    PC.SoftwareName,
    PC.Sequence,
    PC.Site,
    PerSeq.[Count] as 'Win7Installs/seq',
    PerSite.[Count] as TotWin7apps,
    total.[Count] as TotalInstalls 
FROM PCapps pc
inner join (select SoftwareName, Sequence, OS, sum([Count]) [Count] from #SiteSpecific group by SoftwareName, Sequence, OS) PerSeq 
    on PerSeq.SoftwareName = pc.Softwarename 
    AND PerSeq.Sequence = pc.Sequence
    AND PerSeq.OS = pc.OS
inner join (select SoftwareName, Site, OS, sum([Count]) [Count] from #SiteSpecific group by SoftwareName, Site, OS) PerSite 
    on PerSite.SoftwareName = pc.Softwarename 
    AND PerSite.Site = pc.Site
    AND PerSite.OS = pc.OS
inner join (select SoftwareName, sum([Count]) [Count] from #SiteSpecific group by SoftwareName) Total 
    on Total.SoftwareName = pc.Softwarename 
where Pc.OS='Win7'
order by SoftwareName, Sequence, Site
查看更多
登录 后发表回答