Parameterize an SQL IN clause

2018-12-30 22:09发布

How do I parameterize a query containing an IN clause with a variable number of arguments, like this one?

SELECT * FROM Tags 
WHERE Name IN ('ruby','rails','scruffy','rubyonrails')
ORDER BY Count DESC

In this query, the number of arguments could be anywhere from 1 to 5.

I would prefer not to use a dedicated stored procedure for this (or XML), but if there is some elegant way specific to SQL Server 2008, I am open to that.

30条回答
孤独寂梦人
2楼-- · 2018-12-30 22:32

If we have strings stored inside the IN clause with the comma(,) delimited, we can use the charindex function to get the values. If you use .NET, then you can map with SqlParameters.

DDL Script:

CREATE TABLE Tags
    ([ID] int, [Name] varchar(20))
;

INSERT INTO Tags
    ([ID], [Name])
VALUES
    (1, 'ruby'),
    (2, 'rails'),
    (3, 'scruffy'),
    (4, 'rubyonrails')
;

T-SQL:

DECLARE @Param nvarchar(max)

SET @Param = 'ruby,rails,scruffy,rubyonrails'

SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0

You can use the above statement in your .NET code and map the parameter with SqlParameter.

Fiddler demo

EDIT: Create the table called SelectedTags using the following script.

DDL Script:

Create table SelectedTags
(Name nvarchar(20));

INSERT INTO SelectedTags values ('ruby'),('rails')

T-SQL:

DECLARE @list nvarchar(max)
SELECT @list=coalesce(@list+',','')+st.Name FROM SelectedTags st

SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0
查看更多
笑指拈花
3楼-- · 2018-12-30 22:33

If you've got SQL Server 2008 or later I'd use a Table Valued Parameter.

If you're unlucky enough to be stuck on SQL Server 2005 you could add a CLR function like this,

[SqlFunction(
    DataAccessKind.None,
    IsDeterministic = true,
    SystemDataAccess = SystemDataAccessKind.None,
    IsPrecise = true,
    FillRowMethodName = "SplitFillRow",
    TableDefinintion = "s NVARCHAR(MAX)"]
public static IEnumerable Split(SqlChars seperator, SqlString s)
{
    if (s.IsNull)
        return new string[0];

    return s.ToString().Split(seperator.Buffer);
}

public static void SplitFillRow(object row, out SqlString s)
{
    s = new SqlString(row.ToString());
}

Which you could use like this,

declare @desiredTags nvarchar(MAX);
set @desiredTags = 'ruby,rails,scruffy,rubyonrails';

select * from Tags
where Name in [dbo].[Split] (',', @desiredTags)
order by Count desc
查看更多
无与为乐者.
4楼-- · 2018-12-30 22:34

Another possible solution is instead of passing a variable number of arguments to a stored procedure, pass a single string containing the names you're after, but make them unique by surrounding them with '<>'. Then use PATINDEX to find the names:

SELECT * 
FROM Tags 
WHERE PATINDEX('%<' + Name + '>%','<jo>,<john>,<scruffy>,<rubyonrails>') > 0
查看更多
几人难应
5楼-- · 2018-12-30 22:35

I use a more concise version of the top voted answer:

List<SqlParameter> parameters = tags.Select((s, i) => new SqlParameter("@tag" + i.ToString(), SqlDbType.NVarChar(50)) { Value = s}).ToList();

var whereCondition = string.Format("tags in ({0})", String.Join(",",parameters.Select(s => s.ParameterName)));

It does loop through the tag parameters twice; but that doesn't matter most of the time (it won't be your bottleneck; if it is, unroll the loop).

If you're really interested in performance and don't want to iterate through the loop twice, here's a less beautiful version:

var parameters = new List<SqlParameter>();
var paramNames = new List<string>();
for (var i = 0; i < tags.Length; i++)  
{
    var paramName = "@tag" + i;

    //Include size and set value explicitly (not AddWithValue)
    //Because SQL Server may use an implicit conversion if it doesn't know
    //the actual size.
    var p = new SqlParameter(paramName, SqlDbType.NVarChar(50) { Value = tags[i]; } 
    paramNames.Add(paramName);
    parameters.Add(p);
}

var inClause = string.Join(",", paramNames);
查看更多
路过你的时光
6楼-- · 2018-12-30 22:37

The original question was "How do I parameterize a query ..."

Let me state right here, that this is not an answer to the original question. There are already some demonstrations of that in other good answers.

With that said, go ahead and flag this answer, downvote it, mark it as not an answer... do whatever you believe is right.

See the answer from Mark Brackett for the preferred answer that I (and 231 others) upvoted. The approach given in his answer allows 1) for effective use of bind variables, and 2) for predicates that are sargable.

Selected answer

What I want to address here is the approach given in Joel Spolsky's answer, the answer "selected" as the right answer.

Joel Spolsky's approach is clever. And it works reasonably, it's going to exhibit predictable behavior and predictable performance, given "normal" values, and with the normative edge cases, such as NULL and the empty string. And it may be sufficient for a particular application.

But in terms generalizing this approach, let's also consider the more obscure corner cases, like when the Name column contains a wildcard character (as recognized by the LIKE predicate.) The wildcard character I see most commonly used is % (a percent sign.). So let's deal with that here now, and later go on to other cases.

Some problems with % character

Consider a Name value of 'pe%ter'. (For the examples here, I use a literal string value in place of the column name.) A row with a Name value of `'pe%ter' would be returned by a query of the form:

select ...
 where '|peanut|butter|' like '%|' + 'pe%ter' + '|%'

But that same row will not be returned if the order of the search terms is reversed:

select ...
 where '|butter|peanut|' like '%|' + 'pe%ter' + '|%'

The behavior we observe is kind of odd. Changing the order of the search terms in the list changes the result set.

It almost goes without saying that we might not want pe%ter to match peanut butter, no matter how much he likes it.

Obscure corner case

(Yes, I will agree that this is an obscure case. Probably one that is not likely to be tested. We wouldn't expect a wildcard in a column value. We may assume that the application prevents such a value from being stored. But in my experience, I've rarely seen a database constraint that specifically disallowed characters or patterns that would be considered wildcards on the right side of a LIKE comparison operator.

Patching a hole

One approach to patching this hole is to escape the % wildcard character. (For anyone not familiar with the escape clause on the operator, here's a link to the SQL Server documentation.

select ...
 where '|peanut|butter|'
  like '%|' + 'pe\%ter' + '|%' escape '\'

Now we can match the literal %. Of course, when we have a column name, we're going to need to dynamically escape the wildcard. We can use the REPLACE function to find occurrences of the %character and insert a backslash character in front of each one, like this:

select ...
 where '|pe%ter|'
  like '%|' + REPLACE( 'pe%ter' ,'%','\%') + '|%' escape '\'

So that solves the problem with the % wildcard. Almost.

Escape the escape

We recognize that our solution has introduced another problem. The escape character. We see that we're also going to need to escape any occurrences of escape character itself. This time, we use the ! as the escape character:

select ...
 where '|pe%t!r|'
  like '%|' + REPLACE(REPLACE( 'pe%t!r' ,'!','!!'),'%','!%') + '|%' escape '!'

The underscore too

Now that we're on a roll, we can add another REPLACE handle the underscore wildcard. And just for fun, this time, we'll use $ as the escape character.

select ...
 where '|p_%t!r|'
  like '%|' + REPLACE(REPLACE(REPLACE( 'p_%t!r' ,'$','$$'),'%','$%'),'_','$_') + '|%' escape '$'

I prefer this approach to escaping because it works in Oracle and MySQL as well as SQL Server. (I usually use the \ backslash as the escape character, since that's the character we use in regular expressions. But why be constrained by convention!

Those pesky brackets

SQL Server also allows for wildcard characters to be treated as literals by enclosing them in brackets []. So we're not done fixing yet, at least for SQL Server. Since pairs of brackets have special meaning, we'll need to escape those as well. If we manage to properly escape the brackets, then at least we won't have to bother with the hyphen - and the carat ^ within the brackets. And we can leave any %and _ characters inside the brackets escaped, since we'll have basically disabled the special meaning of the brackets.

Finding matching pairs of brackets shouldn't be that hard. It's a little more difficult than handling the occurrences of singleton % and _. (Note that it's not sufficient to just escape all occurrences of brackets, because a singleton bracket is considered to be a literal, and doesn't need to be escaped. The logic is getting a little fuzzier than I can handle without running more test cases.)

Inline expression gets messy

That inline expression in the SQL is getting longer and uglier. We can probably make it work, but heaven help the poor soul that comes behind and has to decipher it. As much of a fan I am for inline expressions, I'm inclined not use one here, mainly because I don't want to have to leave a comment explaining the reason for the mess, and apologizing for it.

A function where ?

Okay, so, if we don't handle that as an inline expression in the SQL, the closest alternative we have is a user defined function. And we know that won't speed things up any (unless we can define an index on it, like we could with Oracle.) If we've got to create a function, we might better do that in the code that calls the SQL statement.

And that function may have some differences in behavior, dependent on the DBMS and version. (A shout out to all you Java developers so keen on being able to use any database engine interchangeably.)

Domain knowledge

We may have specialized knowledge of the domain for the column, (that is, the set of allowable values enforced for the column. We may know a priori that the values stored in the column will never contain a percent sign, an underscore, or bracket pairs. In that case, we just include a quick comment that those cases are covered.

The values stored in the column may allow for % or _ characters, but a constraint may require those values to be escaped, perhaps using a defined character, such that the values are LIKE comparison "safe". Again, a quick comment about the allowed set of values, and in particular which character is used as an escape character, and go with Joel Spolsky's approach.

But, absent the specialized knowledge and a guarantee, it's important for us to at least consider handling those obscure corner cases, and consider whether the behavior is reasonable and "per the specification".


Other issues recapitulated

I believe others have already sufficiently pointed out some of the other commonly considered areas of concern:

  • SQL injection (taking what would appear to be user supplied information, and including that in the SQL text rather than supplying them through bind variables. Using bind variables isn't required, it's just one convenient approach to thwart with SQL injection. There are other ways to deal with it:

  • optimizer plan using index scan rather than index seeks, possible need for an expression or function for escaping wildcards (possible index on expression or function)

  • using literal values in place of bind variables impacts scalability


Conclusion

I like Joel Spolsky's approach. It's clever. And it works.

But as soon as I saw it, I immediately saw a potential problem with it, and it's not my nature to let it slide. I don't mean to be critical of the efforts of others. I know many developers take their work very personally, because they invest so much into it and they care so much about it. So please understand, this is not a personal attack. What I'm identifying here is the type of problem that crops up in production rather than testing.

Yes, I've gone far afield from the original question. But where else to leave this note concerning what I consider to be an important issue with the "selected" answer for a question?

查看更多
只靠听说
7楼-- · 2018-12-30 22:37

I think this is a case when a static query is just not the way to go. Dynamically build the list for your in clause, escape your single quotes, and dynamically build SQL. In this case you probably won't see much of a difference with any method due to the small list, but the most efficient method really is to send the SQL exactly as it is written in your post. I think it is a good habit to write it the most efficient way, rather than to do what makes the prettiest code, or consider it bad practice to dynamically build SQL.

I have seen the split functions take longer to execute than the query themselves in many cases where the parameters get large. A stored procedure with table valued parameters in SQL 2008 is the only other option I would consider, although this will probably be slower in your case. TVP will probably only be faster for large lists if you are searching on the primary key of the TVP, because SQL will build a temporary table for the list anyway (if the list is large). You won't know for sure unless you test it.

I have also seen stored procedures that had 500 parameters with default values of null, and having WHERE Column1 IN (@Param1, @Param2, @Param3, ..., @Param500). This caused SQL to build a temp table, do a sort/distinct, and then do a table scan instead of an index seek. That is essentially what you would be doing by parameterizing that query, although on a small enough scale that it won't make a noticeable difference. I highly recommend against having NULL in your IN lists, as if that gets changed to a NOT IN it will not act as intended. You could dynamically build the parameter list, but the only obvious thing that you would gain is that the objects would escape the single quotes for you. That approach is also slightly slower on the application end since the objects have to parse the query to find the parameters. It may or may not be faster on SQL, as parameterized queries call sp_prepare, sp_execute for as many times you execute the query, followed by sp_unprepare.

The reuse of execution plans for stored procedures or parameterized queries may give you a performance gain, but it will lock you in to one execution plan determined by the first query that is executed. That may be less than ideal for subsequent queries in many cases. In your case, reuse of execution plans will probably be a plus, but it might not make any difference at all as the example is a really simple query.

Cliffs notes:

For your case anything you do, be it parameterization with a fixed number of items in the list (null if not used), dynamically building the query with or without parameters, or using stored procedures with table valued parameters will not make much of a difference. However, my general recommendations are as follows:

Your case/simple queries with few parameters:

Dynamic SQL, maybe with parameters if testing shows better performance.

Queries with reusable execution plans, called multiple times by simply changing the parameters or if the query is complicated:

SQL with dynamic parameters.

Queries with large lists:

Stored procedure with table valued parameters. If the list can vary by a large amount use WITH RECOMPILE on the stored procedure, or simply use dynamic SQL without parameters to generate a new execution plan for each query.

查看更多
登录 后发表回答