I need help also about paging and using UNION ALL
for multiple tables:
How do i implement an optimized paging when joining multiple tables using UNION ALL
and returning only specific number of rows...
declare @startRow int
declare @PageCount int
set @startRow = 0
set @PageCount = 20
set rowcount @PageCount
select Row_Number() OVER(Order by col1) as RowNumber, col1, col2
from
(
select col1, col2 from table1 where datetimeCol between (@dateFrom and @dateTo)
union all
select col1, col2 from table2 where datetimeCol between (@dateFrom and @dateTo)
union all
select col1, col2 from table3 where datetimeCol between (@dateFrom and @dateTo)
union all
select col1, col2 from table4 where datetimeCol between (@dateFrom and @dateTo)
union all
select col1, col2 from table5 where datetimeCol between (@dateFrom and @dateTo)
) as tmpTable
where RowNumber > @startRow
table 3, 4, & 5 have huge number of row (millions of rows) where table 1 & 2 may only have few thousand rows.
If startRow is "0", I only expect data from Row 1 to 20 (from Table1). I'm getting the correct result but has a high overhead on the remaining table while sql server tries to all all the data and filter it....
the longer the interval of the @dateFrom and @dateTo makes my query significantly slower while trying to retrieve only few rows from the overall result set
Please help how i can implement a simple but better approach with a similar logic. :(
Instead of applying classic
OFFSET
based paging (note that SQL Server 2012 now natively supports it), I think your particular use case could greatly profit from a method often referred to as the "seek method" as described in this blog post here. Your query would then look like this.The
@lastValueForCol1
and@lastValueForCol2
values are the respective values of the last record from the previous page. This allows you to fetch the "next" page. If theORDER BY
direction isDESC
, simply use<
instead. If(col1, col2)
is not globally unique across yourtmpTable
, you may need to add another column to the query and to theWHERE
andORDER BY
clause to avoid losing records between pages.With the above method, you cannot immediately jump to page 3 without having first fetched the previous 40 records. But often, you do not want to jump that far anyway. Instead, you get a much faster query that might be able to fetch data in constant time, depending on your indexing. Plus, your pages remain "stable", no matter if the underlying data changes (e.g. on page 1, while you're on page 4).
Note, the "seek method" is also called keyset paging.
Indexing
While paging with the "seek method" is always faster than when using
OFFSET
, you should still make sure that(col1, col2)
is indexed in each one of your tables!There maybe an issue with your database design since you have 5 similar tables. But besides this, you could materialize your UNION ALL query into a permanent table or a temp #-table with appropriate indexes on it and finally paginate over materialized data set with ROW_NUMBER() clause.
is essentially as efficient as a normal table provided there is an index on the sorting key that you use for paging. This usually results in a query plan where all tables are merge concatenated. Merge concatenating is a streaming operation. It's cost is proportional to the number of rows drawn, not the number of rows in the tables.
Paging by row number in SQL Server always works by enumerating rows start to end until the desired window is reached. Whether the rows are drawn from a table or from multiple merged tables does not make a fundamental difference.
So a good chance to make this fast is to create a covering index keyed on
col1
. Unfortunately, it's not possible to index forbetween (@dateFrom and @dateTo)
at the same time. So you have to try both indexing strategies and pick what works best.Since the tables are ordered in the result set for the paging (Union ALL does not sort), there is no reason to select from all 5 tables. You should change the code to :
Managing the offset count according to the number of records queried each time. This way, you only query the tables you need.
You can even optimize by selecting the number of records in a table according to the filter to know if you need to query any data from it. So if you want records 30-50, and table1 only has 20 matching records in it, you can skip it altogether.
Consider using OFFSET FETCH clause (works starting with MSSQL 2012):
I also want to mention here, why this query always takes O(n*log(n)) time.
To execute this query, database needs to:
As you can see, you need to union and sort all data to get the expected result (number 3).
If the performance of this query is still poor and you want to increase in, try to: