The query is executing very slowly, is there any w

2020-06-13 13:02发布

I have the following query, and because of a lot of SUM function calls, my query is running too slow. I have a lot of records in my database and I would like to get a report from the current year and last year (Last 30 days, Last 90 days and last 365 days) for each one:

SELECT 
    b.id as [ID]
    ,d.[Title] as [Title]
    ,e.Class as [Class]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 30 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 30 Days Col2]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 90 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 90 Days Col2]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 365 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 365 Days Col2]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 30 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 30 Days Col2]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 90 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 90 Days Col2]

    ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 365 Days Col1]
    ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 365 Days Col2]


    FROM 
    tb1 a
INNER JOIN 
    tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN 
    tb3 c on b.fid = c.col5
INNER JOIN       
    tb4 d on c.id = d.col6
INNER JOIN 
    tb5 e on c.col7 = e.id
GROUP BY
    b.id, d.Title, e.Class

Does anyone have any idea how can I improve my query in order to run faster?

EDIT: I was encouraged to move the DATEADD function call to the where statement and load first two years first then filter them in columns, but I am not sure the suggested answer is executed and works, it could be found here: https://stackoverflow.com/a/59944426/12536284

If you agree with the above solution, please show me how can I apply it in my current query?

Just FYI, I am using this SP in C#, Entity Framework (DB-First), something like this:

var result = MyDBEntities.CalculatorSP();

8条回答
对你真心纯属浪费
2楼-- · 2020-06-13 13:19

I would use a lookup table "Dates" table to join my data to with an index on DatesId. I use the dates as a filter when I want to browse historical data. The join is fast and so it the filtering as the DatesId is clustered primary index (primary key). Add the date column (as included column) for your data table as well.

The dates table has the following columns:

DatesId, Date, Year, Quarter, YearQuarter, MonthNum, MonthNameShort, YearWeek, WeekNum, DayOfYear, DayOfMonth, DayNumOfWeek, DayName

Example data: 20310409 2031-04-09 2031 2 2031-Q2 4 April Apr 2031_15 15 99 9 3 Wednesday

You can PM me if you want a csv of this so that you can import it to the database, but I'm sure you can easily find something like this online and make your own.

I add an identity column as well so that you can get an integer for each date. This makes it a bit easier to work with, but not a requirement.

SELECT * FROM dbo.dates where dateIndex BETWEEN (getDateIndexDate(getDate())-30 AND getDateIndexDate(getDate())+0) --30 days ago

This allows me to easily jump back to a certain period. It's quite easy to create your own views on this. You can of course use the ROW_NUMBER() function to do this for years, weeks, etc. as well.

Once I have the daterange I want, I join to the data. Works very fast!

查看更多
我只想做你的唯一
3楼-- · 2020-06-13 13:35

For optimizing such calculations you man consider pre-calculating some of the values. The idea of pre-calculations is to reduce the number of rows that need to be read or proceed.

One way of achieving this is using an indexed view and leave the engine to do the calculations by itself. As this type of views have some limitations, you man end up creating a simple table and perform the calculations instead. Basically, it depends on the business needs.

So, in the example below I am creating a table with RowID and RowDatetime columns and inserting 1 million rows. I am using an indexed view to count the entities per days, so instead of querying 1 million rows per year I will query 365 rows per year to count these metrics.

DROP TABLE IF EXISTS [dbo].[DataSource];
GO

CREATE TABLE [dbo].[DataSource]
(
    [RowID] BIGINT IDENTITY(1,1) PRIMARY KEY
   ,[RowDateTime] DATETIME2
);

GO

DROP VIEW IF EXISTS [dbo].[vw_DataSource];
GO

CREATE VIEW [dbo].[vw_DataSource] WITH SCHEMABINDING
AS
SELECT YEAR([RowDateTime]) AS [Year]
      ,MONTH([RowDateTime]) AS [Month]
      ,DAY([RowDateTime]) AS [Day]
      ,COUNT_BIG(*) AS [Count]
FROM [dbo].[DataSource]
GROUP BY YEAR([RowDateTime])
        ,MONTH([RowDateTime])
        ,DAY([RowDateTime]);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_vw_DataSource] ON [dbo].[vw_DataSource]
(
    [Year] ASC,
    [Month] ASC,
    [Day] ASC
);

GO

DECLARE @min bigint, @max bigint
SELECT @Min=1 ,@Max=1000000

INSERT INTO [dbo].[DataSource] ([RowDateTime])
SELECT TOP (@Max-@Min+1) DATEFROMPARTS(2019,  1.0 + floor(12 * RAND(convert(varbinary, newid()))), 1.0 + floor(28 * RAND(convert(varbinary, newid())))          )       
FROM master..spt_values t1 
CROSS JOIN master..spt_values t2

GO


SELECT *
FROM [dbo].[vw_DataSource]


SELECT SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(MONTH,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 30 Days Col1]
      ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(QUARTER,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 90 Days Col1]
      ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(YEAR,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 365 Days Col1]
FROM [dbo].[vw_DataSource];

The success of such solution depends very much on how the data is distributed and how many rows you have. For example, if you have one entry per day for each day of the year, the view and the table will have same match of rows, so the I/O operations will not be reduced.

Also, the above is just an example of materializing the data and reading it. In your case you may need to add more columns the view definition.

查看更多
Luminary・发光体
4楼-- · 2020-06-13 13:35

Since you are always grouping values based on a whole number of months, I would first group by month in a subquery in the from clause. This is similar to using a temporary table. Not certain if this would actually speed up your query.

SELECT f.id, f.[Title], f.Class,
    SUM(CASE WHEN f.MonthDiff = 1 THEN col1 ELSE 0 END) as [Current - Last 30 Days Col1],
    -- etc
FROM (
    SELECT 
        b.id,
        d.[Title],
        e.Class,
        DateDiff(Month, a.DateCol, GETDATE()) as MonthDiff,
        Sum(a.col1) as col1,
        Sum(a.col2) as col2
    FROM  tb1 a
    INNER JOIN tb2 b on a.id = b.fid and a.col3 = b.col4
    INNER JOIN tb3 c on b.fid = c.col5
    INNER JOIN tb4 d on c.id = d.col6
    INNER JOIN tb5 e on c.col7 = e.id
    WHERE a.DateCol between DATEADD(YEAR,-2,GETDATE() and GETDATE()
    GROUP BY b.id, d.Title, e.Class, DateDiff(Month,  a.DateCol, GETDATE())
) f
group by f.id, f.[Title], f.Class
查看更多
家丑人穷心不美
5楼-- · 2020-06-13 13:35

To improve the speed of SQL query, you must add indexes. For each joined table, you have to add one index.

Like this code example for oracle:

CREATE INDEX supplier_idx
ON supplier (supplier_name);
查看更多
beautiful°
6楼-- · 2020-06-13 13:36

I assume tb1 is a large table (relative to tb2, tb3, tb4 and tb5).

If so, it makes sense here to restrict the selection of that table (with a WHERE clause).

If only a small part of tb1 is used, for example because the joins with tb2, tb3, tb4 and tb5 reduce the needed rows to just a few percent, then you should check if the tables are indexed on the columns you use in the joins.

If a large part of tb1 is used, then it can make sense to group its results before joining it to tb2, tb3, tb4 and tb5. Below is an example of that.

SELECT 
    b.id as [ID]
    ,d.[Title] as [Title]
    ,e.Class as [Class]
    ,SUM(a.[Current - Last 30 Days Col1]) AS [Current - Last 30 Days Col1]
    ,SUM(a.[Current - Last 30 Days Col2]) AS [Current - Last 30 Days Col2]
    ,SUM(a.[Current - Last 90 Days Col1]) AS [Current - Last 90 Days Col1]
    -- etc.
    FROM (
      SELECT a.id, a.col3

      ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 30 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 30 Days Col2]

      ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 90 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 90 Days Col2]

      ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 365 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 365 Days Col2]

      ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 30 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 30 Days Col2]

      ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 90 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 90 Days Col2]

      ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 365 Days Col1]
      ,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 365 Days Col2]

      FROM  tb1 a
      WHERE a.DateCol >= DATEADD(YEAR,-2,GETDATE())
      GROUP BY a.id, a.col3
    ) AS a
INNER JOIN 
    tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN 
    tb3 c on b.fid = c.col5
INNER JOIN       
    tb4 d on c.id = d.col6
INNER JOIN 
    tb5 e on c.col7 = e.id
GROUP BY
    b.id, d.Title, e.Class
查看更多
女痞
7楼-- · 2020-06-13 13:39

As it has been mentioned already, the execution plan will be really helpful in this case. Based on what you've shown it seems you have extracted 12 columns of 15 total columns from tb1 (a), so you can try to run your query without any join and just against the tb1 to see whether your query is working as expected. Since I can see nothing wrong with your SUM function calls, my best guess is you have an issue with your joins, I would suggest to do the following. You can start by excluding the last join for instance, INNER JOIN tb5 e on c.col7 = e.id and any related usage of it like e.Class as [Class] and e.Class in your group by statement. We are not going to exclude it completely, this is just a test to make sure whether the problem is with that or not, if your query runs better and as expected you can try to use a temp table as a workaround instead of the last join, something like this:

SELECT *
INTO #Temp
FROM
  (
     select * from tb5
  ) As tempTable;

SELECT 
    b.id as [ID]
    ,d.[Title] as [Title]
    ,e.Class as [Class]

    -- SUM Functions

FROM 
    tb1 a
INNER JOIN 
    tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN 
    tb3 c on b.fid = c.col5
INNER JOIN       
    tb4 d on c.id = d.col6
INNER JOIN 
    #Temp e on c.col7 = e.id
GROUP BY
    b.id, d.Title, e.Class

Actually, Temporary tables are tables that exist temporarily on the SQL Server. The temporary tables are useful for storing the immediate result sets that are accessed multiple times. You can read more about it here https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ And here https://codingsight.com/introduction-to-temporary-tables-in-sql-server/

Also I would strongly recommend, if you are using the Stored Procedure, set the NOCOUNT to ON, it can also provide a significant performance boost, because network traffic is greatly reduced:

SET NOCOUNT ON
SELECT *
INTO #Temp
-- The rest of code

Based on this:

SET NOCOUNT ON is a set statement which prevents the message which shows the number of rows affected by T-SQL query statements. This is used within stored procedures and triggers to avoid showing the affected rows message. Using SET NOCOUNT ON within a stored procedure can improve the performance of the stored procedure by a significant margin.

查看更多
登录 后发表回答