I have a query doing something like:
SELECT FieldX, FieldY FROM A
WHERE FieldW IN (108, 109, 113, 138, 146, 160,
307, 314, 370, 371, 441, 454 ,457, 458, 479, 480,
485, 488, 490, 492, 519, 523, 525, 534, 539, 543,
546, 547, 550, 564, 573, 629, 642, 643, 649, 650,
651, 694, 698, 699, 761, 762, 768, 772, 773, 774,
775, 778, 784, 843, 844, 848, 851, 852, 853, 854,
855, 856, 857, 858, 859, 860, 861, 862, 863, 864,
865, 868, 869, 871, 872, 873, 891)
Having an IN clause with so many options, is it bad for query performance? I'm experiencing many timeouts in my application, and I believe it could be a source of this kind of problem. Can I optimize the query without removing the numbers, using any good SQL Hint?
EDIT:
@KM these are keys in a different table. This is a forum application, explaining briefly: c# gets all forums from database and stores it in app cache. Before C# calls a procedure that gets the threads for these forums and for this user, c# does some logic filtering the "all forums" collection, considering permissions and some business logic. The timeout happens on database and not on application itself. Doing all this logic on the query will require a lot of inner joins and I'm not 100% sure I can do all this inside the procedure.
I'm using SQL Server 2000
Depending on your data distribution, additional predicates in your WHERE clause may improve performance. For example, if the set of ids is small relative to the total number in the table, and you know that the ids are relatively close together (perhaps they will usually be recent additions, and therefore clustered at the high end of the range), you could try and include the predicate "AND FieldW BETWEEN 109 AND 891" (after determining the min & max id in your set in the C# code). It may be that doing a range scan on those columns (if indexed) works faster than what is currently being used.
There are better ways to code it, but I doubt it's the cause of your timeouts, especially if it's only a SELECT. You should be able to determine that by looking at your query traces though. But recoding this would be optimization by guessing, and an unlikely guess at that.
Let's start with a query plan for the query that is actually timing out. Do you know for sure which query it is?
Performance can only be judged in the context of what you are trying to do. In this case you are requesting the retrieval of around 70 rows (assuming thay are unique values), so you can expect something like 70 times the duration of retrieving a single value. It might be less due to caching, or course.
However, the query optimiser may need or choose to perform a full table scan in order to retrieve the values, in which case performace will be little different than retrieving a single value via the same access plan.
You might try something like:
It may be appropriate for your situation, such as when you want to generate a single SQL statement dynamically. On my machine (SQL Server 2008 Express), testing with a small number (5) of FieldW values and a large number (100,000) of rows in A, this uses an index seek on A with a nested loops join between A and _a, which is probably what you're looking for.
Typically the IN clause is harmful to performance, but what is "bad" depends on the application, data, database size, etc. You need to test your own app to see what is best.
Here is your answer...
http://www.4guysfromrolla.com/webtech/031004-1.shtml
Basically, you want to create a function that will split a string and populate a temp table with the split contents. Then you can join to that temp table and manipulate your data. The above explains things pretty well. I use this technique a lot.
In your specific case use a join to the temp table instead of an in clause, much faster.