Efficient way of getting @@rowcount from a query u

2019-01-12 18:37发布

问题:

I have an expensive query using the row_number over() functionality in SQL Server 2005. I return only a sub list of those records as the query is paginated. However, I would like to also return the total number of records, not just the paginated subset. Running the query effectively twice to get the count is out of the question.

Selecting count(*) is also out of the question as the performance is absolutely terrible when I've tried this.

What I'd really love is @@ROW_NUMBERROWCOUNT :-)

回答1:

Check out the COUNT(*) aggregate when used with OVER(PARTITON BY..), like so:

    SELECT
     ROW_NUMBER() OVER(ORDER BY object_id, column_id) as RowNum
    , COUNT(*) OVER(PARTITION BY 1) as TotalRows
    , * 
    FROM master.sys.columns

This is IMHO the best way to do it without having to do two queries.



回答2:

Over the years a pile of developer sweat has gone into efficiently paging result sets. Yet, there is no one answer--it depends on your use case. Part of the use case is getting your page efficiently, part is figuring out how many rows are in a complete result set. So sorry if i stray a little into paging, but the two are pretty tightly coupled in my mind.

There are a lot of strategies, most of which are bad if you have any sort of data volume & don't fit the use case. While this isn't a complete list, following are some of the options.....

Run Separate Count(*)

  • run a separate query that does a simple "select count(*) from MyTable"
  • simple and easy for a small table
  • good on an unfiltered large table that is either narrow or has a compact non-clustered index you can use
  • breaks down when you have a complicated WHERE/JOIN criteria because running the WHERE/JOIN twice is expensive.
  • breaks down on a wide index because the number of reads goes up.

Combine ROW_Number() OVER() and COUNT(1) OVER(PARTITION By 1)

  • This was suggested by @RBarryYoung. It has the benefit of being simple to implement and very flexible.
  • The down side is that there are a lot of reasons this can become extremely expensive quickly.
  • For example, in a DB i'm currently working there is a Media table with about 6000 rows. It's not particularly wide, has a integer clustered PK and, as well as a compact unique index. Yet, a simple COUNT(*) OVER(PARTITION BY 1) as TotalRows results in ~12,000 reads. Compare that to a simple SELECT COUNT(*) FROM Media -- 12 reads. Wowzers.

UPDATE -- the reads issue I mentioned is a bit of red-herring. It turns out, that with windowed functions the unit used to measure reads is kind of mixed. The net result is what appears to be massive numbers of reads. You can see more on the issue here : Why are logical reads for windowed aggregate functions so high?

Temp Tables / Table Variables

  • There are lots of strategies that take a result set and insert relevant keys or segments of results into temp tables / table variables.
  • For small/medium sized result sets this can provide great results.
  • This type of strategy works across almost any platform/version of SQL.
  • Operating on a result set multiple times (quite often a requirement) is also easy.
  • The down side is when working with large results sets ... inserting a few million rows into a temp table has a cost.
  • Compounding the issue, in a high volume system pressure on TempDB can be quite a factor, and temp tables are effectively working in TempDB.

Gaussian Sum / Double Row Number

  • This idea relies on subset of something the mathematician Gauss figured out (how to sum a series of numbers). The subset is how to get row count from any point in the table.
  • From a series of numbers (Row_Number()) the row count for 1 to N is (N + 1) - 1. More explanation in the links.
  • The formula seems like it would net out to just N, but the if you stick with the formula an interesting things happens, you can figure out row count from a page in the middle of the table.
  • The net result is you do ROW_Number() OVER(Order by ID) and ROW_Number() OVER(Order by ID DESC) then sum the two numbers and subtract 1.
  • Using my Media table as an example my reads dropped from 12,000 to about 75.
  • In a larger page you've ended up repeating data many many times, but the offset in reads may be worth it.
  • I haven't tested this on too many scenarios, so it may fall apart in other scenarios.

Top (@n) / SET ROWCOUNT

  • These aren't specific strategies per-se, but are optimizations based on what we know about the query optimizer.
  • Creatively using Top(@n) [top can be a variable in SQL 2008] or SET ROWCOUNT can reduce your working set ...even if you're pulling a middle page of a result set you can still narrow the result
  • These ideas work because of query optimizer behavior ...a service pack/hotfix can change the behavior (although probably not).
  • In certian instances SET ROWCOUNT can be a bit in accurate
  • This strategy doesn't account for getting the full row count, just makes paging more efficient

So what's a developer to do?

Read my good man, read. Here are some articles that I've leaned on...

  • A More Efficient Method for Paging Through Large Result Sets
  • Optimising Server-Side Paging - Part I
  • Optimising Server-Side Paging - Part II
  • Explaination of the Gaussian Sum
  • Returning Ranked Results with Microsoft SQL Server 2005
  • ROW_NUMBER() OVER Not Fast Enough With Large Result Set
  • Retrieving the First N Records from a SQL Query
  • Server Side Paging using SQL Server 2005
  • Why are logical reads for windowed aggregate functions so high?

Hope that helps.



回答3:

If count(*) is slow you really need to address that issue first by carefully examining your indexes and making sure your statistics are up to date.

In my experience, there is nothing better than doing two separate queries, one to get the data page, and one to get the total count. Using a temporary table in order to get total counts is a losing strategy as your number of rows increases. E.g., the cost of inserting 10,000,000 million rows into a temp table simply to count them is obviously going to be excessive.



回答4:

I do this by putting the whole resultset with the row_number into a temp table, then use the @@rowcount from that and use the query on that to return the page of data I need.