Is SQL IN bad for performance?

2019-01-04 22:48发布

I have a query doing something like:

SELECT FieldX, FieldY FROM A
WHERE FieldW IN (108, 109, 113, 138, 146, 160,
307, 314, 370, 371, 441, 454 ,457, 458, 479, 480,
485, 488, 490, 492, 519, 523, 525, 534, 539, 543,
546, 547, 550, 564, 573, 629, 642, 643, 649, 650,
651, 694, 698, 699, 761, 762, 768, 772, 773, 774,
775, 778, 784, 843, 844, 848, 851, 852, 853, 854,
855, 856, 857, 858, 859, 860, 861, 862, 863, 864,
865, 868, 869, 871, 872, 873, 891) 

Having an IN clause with so many options, is it bad for query performance? I'm experiencing many timeouts in my application, and I believe it could be a source of this kind of problem. Can I optimize the query without removing the numbers, using any good SQL Hint?

EDIT:

@KM these are keys in a different table. This is a forum application, explaining briefly: c# gets all forums from database and stores it in app cache. Before C# calls a procedure that gets the threads for these forums and for this user, c# does some logic filtering the "all forums" collection, considering permissions and some business logic. The timeout happens on database and not on application itself. Doing all this logic on the query will require a lot of inner joins and I'm not 100% sure I can do all this inside the procedure.

I'm using SQL Server 2000

14条回答
可以哭但决不认输i
2楼-- · 2019-01-04 23:11

Depending on your data distribution, additional predicates in your WHERE clause may improve performance. For example, if the set of ids is small relative to the total number in the table, and you know that the ids are relatively close together (perhaps they will usually be recent additions, and therefore clustered at the high end of the range), you could try and include the predicate "AND FieldW BETWEEN 109 AND 891" (after determining the min & max id in your set in the C# code). It may be that doing a range scan on those columns (if indexed) works faster than what is currently being used.

查看更多
祖国的老花朵
3楼-- · 2019-01-04 23:12

There are better ways to code it, but I doubt it's the cause of your timeouts, especially if it's only a SELECT. You should be able to determine that by looking at your query traces though. But recoding this would be optimization by guessing, and an unlikely guess at that.

Let's start with a query plan for the query that is actually timing out. Do you know for sure which query it is?

查看更多
你好瞎i
4楼-- · 2019-01-04 23:13

Performance can only be judged in the context of what you are trying to do. In this case you are requesting the retrieval of around 70 rows (assuming thay are unique values), so you can expect something like 70 times the duration of retrieving a single value. It might be less due to caching, or course.

However, the query optimiser may need or choose to perform a full table scan in order to retrieve the values, in which case performace will be little different than retrieving a single value via the same access plan.

查看更多
虎瘦雄心在
5楼-- · 2019-01-04 23:17

You might try something like:

select a.FieldX, a.FieldY
from (
    select FieldW = 108 union
    select FieldW = 109 union
    select FieldW = 113 union
    ...
    select FieldW = 891
) _a
join A a on a.FieldW = _a.FieldW

It may be appropriate for your situation, such as when you want to generate a single SQL statement dynamically. On my machine (SQL Server 2008 Express), testing with a small number (5) of FieldW values and a large number (100,000) of rows in A, this uses an index seek on A with a nested loops join between A and _a, which is probably what you're looking for.

查看更多
劫难
6楼-- · 2019-01-04 23:18

Typically the IN clause is harmful to performance, but what is "bad" depends on the application, data, database size, etc. You need to test your own app to see what is best.

查看更多
甜甜的少女心
7楼-- · 2019-01-04 23:20

Here is your answer...

http://www.4guysfromrolla.com/webtech/031004-1.shtml

Basically, you want to create a function that will split a string and populate a temp table with the split contents. Then you can join to that temp table and manipulate your data. The above explains things pretty well. I use this technique a lot.

In your specific case use a join to the temp table instead of an in clause, much faster.

查看更多
登录 后发表回答