I have the following query, and because of a lot of SUM
function calls, my query is running too slow. I have a lot of records in my database and I would like to get a report from the current year and last year (Last 30 days, Last 90 days and last 365 days) for each one:
SELECT
b.id as [ID]
,d.[Title] as [Title]
,e.Class as [Class]
,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 30 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 30 Days Col2]
,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 90 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 90 Days Col2]
,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Current - Last 365 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Current - Last 365 Days Col2]
,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 30 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 30 Days Col2]
,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 90 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 90 Days Col2]
,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END) as [Last year - Last 365 Days Col1]
,Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END) as [Last year - Last 365 Days Col2]
FROM
tb1 a
INNER JOIN
tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN
tb3 c on b.fid = c.col5
INNER JOIN
tb4 d on c.id = d.col6
INNER JOIN
tb5 e on c.col7 = e.id
GROUP BY
b.id, d.Title, e.Class
Does anyone have any idea how can I improve my query in order to run faster?
EDIT: I was encouraged to move the DATEADD
function call to the where
statement and load first two years first then filter them in columns, but I am not sure the suggested answer is executed and works, it could be found here: https://stackoverflow.com/a/59944426/12536284
If you agree with the above solution, please show me how can I apply it in my current query?
Just FYI, I am using this SP in C#, Entity Framework (DB-First), something like this:
var result = MyDBEntities.CalculatorSP();
I would use a lookup table "Dates" table to join my data to with an index on DatesId. I use the dates as a filter when I want to browse historical data. The join is fast and so it the filtering as the DatesId is clustered primary index (primary key). Add the date column (as included column) for your data table as well.
The dates table has the following columns:
DatesId, Date, Year, Quarter, YearQuarter, MonthNum, MonthNameShort, YearWeek, WeekNum, DayOfYear, DayOfMonth, DayNumOfWeek, DayName
Example data: 20310409 2031-04-09 2031 2 2031-Q2 4 April Apr 2031_15 15 99 9 3 Wednesday
You can PM me if you want a csv of this so that you can import it to the database, but I'm sure you can easily find something like this online and make your own.
I add an identity column as well so that you can get an integer for each date. This makes it a bit easier to work with, but not a requirement.
This allows me to easily jump back to a certain period. It's quite easy to create your own views on this. You can of course use the ROW_NUMBER() function to do this for years, weeks, etc. as well.
Once I have the daterange I want, I join to the data. Works very fast!
For optimizing such calculations you man consider pre-calculating some of the values. The idea of pre-calculations is to reduce the number of rows that need to be read or proceed.
One way of achieving this is using an indexed view and leave the engine to do the calculations by itself. As this type of views have some limitations, you man end up creating a simple table and perform the calculations instead. Basically, it depends on the business needs.
So, in the example below I am creating a table with
RowID
andRowDatetime
columns and inserting 1 million rows. I am using an indexed view to count the entities per days, so instead of querying 1 million rows per year I will query 365 rows per year to count these metrics.The success of such solution depends very much on how the data is distributed and how many rows you have. For example, if you have one entry per day for each day of the year, the view and the table will have same match of rows, so the I/O operations will not be reduced.
Also, the above is just an example of materializing the data and reading it. In your case you may need to add more columns the view definition.
Since you are always grouping values based on a whole number of months, I would first group by month in a subquery in the from clause. This is similar to using a temporary table. Not certain if this would actually speed up your query.
To improve the speed of SQL query, you must add indexes. For each joined table, you have to add one index.
Like this code example for oracle:
I assume tb1 is a large table (relative to tb2, tb3, tb4 and tb5).
If so, it makes sense here to restrict the selection of that table (with a WHERE clause).
If only a small part of tb1 is used, for example because the joins with tb2, tb3, tb4 and tb5 reduce the needed rows to just a few percent, then you should check if the tables are indexed on the columns you use in the joins.
If a large part of tb1 is used, then it can make sense to group its results before joining it to tb2, tb3, tb4 and tb5. Below is an example of that.
As it has been mentioned already, the execution plan will be really helpful in this case. Based on what you've shown it seems you have extracted 12 columns of 15 total columns from
tb1 (a)
, so you can try to run your query without any join and just against thetb1
to see whether your query is working as expected. Since I can see nothing wrong with your SUM function calls, my best guess is you have an issue with your joins, I would suggest to do the following. You can start by excluding the last join for instance,INNER JOIN tb5 e on c.col7 = e.id
and any related usage of it likee.Class as [Class]
ande.Class
in your group by statement. We are not going to exclude it completely, this is just a test to make sure whether the problem is with that or not, if your query runs better and as expected you can try to use a temp table as a workaround instead of the last join, something like this:Actually, Temporary tables are tables that exist temporarily on the SQL Server. The temporary tables are useful for storing the immediate result sets that are accessed multiple times. You can read more about it here https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ And here https://codingsight.com/introduction-to-temporary-tables-in-sql-server/
Also I would strongly recommend, if you are using the Stored Procedure, set the
NOCOUNT
toON
, it can also provide a significant performance boost, because network traffic is greatly reduced:Based on this: