This question already has an answer here:
We have a table with 250 Million records (unique 15 digit number. Clustered Unique Index column) which will be queried at least by 0.7 to 0.9 Million requests on an average/day.
We have multi applications accessing this table. Each application will try to compare 500,000 data against this 260 Million records.
We have application that will add more data to this Large Table which is actually slowing down the querying of other applications.
How can we improve the performance of the query? How can we maintain this table? Partition it? Environment: Win 2K8R2, SQL 2K8R2, 64GB RAM, Dual Processor 8 Cores
1. Use temporary tables
Create temporary table on subset (rows and columns) of data you are interested in. Temporary table should be much smaller that original source table and can be indexed easily (if needed).
To create temporary table you can use code (not tested) like:
Pros:
Easy to do for any subset of data. Easy to manage -- it's temporary and it's table. Doesn't affect overall system performance like view. Temporary table can be indexed. Cons:
It's snapshot of data -- but probably this is good enough for ad-hoc queries.
2. Create views
Similar to above, but create views instead of temporary tables.
You can create views or indexed views on subset of data you are interested in and run queries on view -- which should contain only interesting subset of data much smaller then the whole table.
Pros:
Easy to do. It's up to date with source data. Cons:
Possible only for defined subset of data. Could be inefficient for large tables with high rate of updates. Not so easy to manage. Can affect overall system performance. Selecting all columns Running star query (SELECT * FROM) on big table is not good thing...
If you have large columns (like long strings) it takes a lot of time to read them from disk and pass by network.
I would try to replace * with column names which you really need.
Or, if you need all columns try to rewrite query to something like:
Dirty reads
Last thing which could speed up the query is allowing dirty reads with table hint WITH (NOLOCK).
Instead of hint you can set transaction isolation level to read uncommited:
If multiple applications are only trying to compare data then i believe these are not writing to the table, caching records should help as well. Also there is technique called sharding which unfortunately SQL server doesnt provide it. But there is a library on codeplex that provides such a feature for SQL Server. It basically tries to balance load on databases.
I havent tested it but should worth a try. If you want you can see it here http://enzosqlshard.codeplex.com/