Querying Large Table in sql server 2008 [duplicate

This question already has an answer here:

Improve SQL Server query performance on large tables 9 answers

We have a table with 250 Million records (unique 15 digit number. Clustered Unique Index column) which will be queried at least by 0.7 to 0.9 Million requests on an average/day.

We have multi applications accessing this table. Each application will try to compare 500,000 data against this 260 Million records.

We have application that will add more data to this Large Table which is actually slowing down the querying of other applications.

How can we improve the performance of the query? How can we maintain this table? Partition it? Environment: Win 2K8R2, SQL 2K8R2, 64GB RAM, Dual Processor 8 Cores

标签： sql-server sql-server-2008 query-performance large-data

2条回答

时光不老，我们不散

2楼-- · 2019-07-29 23:07

1. Use temporary tables

Create temporary table on subset (rows and columns) of data you are interested in. Temporary table should be much smaller that original source table and can be indexed easily (if needed).

To create temporary table you can use code (not tested) like:

-- copy records from last month to temporary table
INSERT INTO
   #my_temporary_table
SELECT
    *
FROM
    er101_acct_order_dtl WITH (NOLOCK)
WHERE 
    er101_upd_date_iso > DATEADD(month, -1, GETDATE())

-- run other queries on temporary table (which can be indexed)
SELECT TOP 100
    * 
FROM 
    #my_temporary_table 
ORDER BY 
    er101_upd_date_iso DESC

Pros:

Easy to do for any subset of data. Easy to manage -- it's temporary and it's table. Doesn't affect overall system performance like view. Temporary table can be indexed. Cons:

It's snapshot of data -- but probably this is good enough for ad-hoc queries.

2. Create views

Similar to above, but create views instead of temporary tables.

You can create views or indexed views on subset of data you are interested in and run queries on view -- which should contain only interesting subset of data much smaller then the whole table.

Pros:

Easy to do. It's up to date with source data. Cons:

Possible only for defined subset of data. Could be inefficient for large tables with high rate of updates. Not so easy to manage. Can affect overall system performance. Selecting all columns Running star query (SELECT * FROM) on big table is not good thing...

If you have large columns (like long strings) it takes a lot of time to read them from disk and pass by network.

I would try to replace * with column names which you really need.

Or, if you need all columns try to rewrite query to something like:

;WITH recs AS (
    SELECT TOP 100 
        id as rec_id -- select primary key only
    FROM 
        er101_acct_order_dtl 
    ORDER BY 
        er101_upd_date_iso DESC
)
SELECT
    *
FROM
    er101_acct_order_dtl
WHERE 
    id = rec.rec_id
ORDER BY 
    er101_upd_date_iso DESC

Dirty reads

Last thing which could speed up the query is allowing dirty reads with table hint WITH (NOLOCK).

Instead of hint you can set transaction isolation level to read uncommited:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

0人赞添加讨论(0) 举报

迷人小祖宗

3楼-- · 2019-07-29 23:29

If multiple applications are only trying to compare data then i believe these are not writing to the table, caching records should help as well. Also there is technique called sharding which unfortunately SQL server doesnt provide it. But there is a library on codeplex that provides such a feature for SQL Server. It basically tries to balance load on databases.

I havent tested it but should worth a try. If you want you can see it here http://enzosqlshard.codeplex.com/

0人赞添加讨论(0) 举报