I have a StackOverflow-like tagging system for a database I'm working on. And I'm writing a stored procedure that looks for results based on an undetermined number of tags in a WHERE clause. There could be anywhere between 0 and 10 tags to filter results. So for example the user could be searching for items tagged with 'apple', 'orange', and 'banana' and each result must include all 3 tags. My query is made even more complicated because I'm also dealing with a cross reference table for the tagging, but for the purposes of this question I won't go into that.
I know I can do some string manipulation and feed the exec() function a query to take care of this but I'd rather not for performance problems associated with dynamic SQL. I figure it's best if SQL caches a query plan for the stored proc.
What are some techniques you've used to avoid dynamic SQL in this type of scenario?
By popular demand, here's the query I'm working with:
SELECT ft.[RANK], s.shader_id, s.page_name, s.name, s.description, s.download_count, s.rating, s.price FROM shader s
INNER JOIN FREETEXTTABLE(shader, *, @search_term) AS ft ON s.shader_id = ft.[KEY]
WHERE EXISTS(SELECT tsx.shader_id FROM tag_shader_xref tsx INNER JOIN tag t ON tsx.tag_id = t.tag_id WHERE tsx.shader_id = s.shader_id AND t.tag_name = 'color')
AND EXISTS(SELECT tsx.shader_id FROM tag_shader_xref tsx INNER JOIN tag t ON tsx.tag_id = t.tag_id WHERE tsx.shader_id = s.shader_id AND t.tag_name = 'saturation')
ORDER BY ft.[RANK] DESC
This is functional but hard-coded. You'll see that I have it set to look for the 'color' and 'saturation' tags.
I upvoted Henrik's answer, but another alternative I can think of is getting the search tags into a temporary table or table variable and then doing a JOIN on it or using an IN clause with a sub-SELECT. Since you want results with all the searched tags, you'd want to count the number of query tags first and then find results where the number of matched tags equals that number.
How to put the values into a table? If the tags are being passed to your stored procedure, and if you're using SQL Server 2008, then you could use the new table-valued parameters feature and pass a table variable directly to your stored procedure.
Otherwise, if you receive the tags in a single string then you could use a stored function which return a table, such as the SplitString function shown here. The you can do something like:
String the tags together with a comma separating them 'apple','orange' and then pass it in to one parameter that uses the IN clause on it in your stored procedure.
Of course if you have the values (key) from the lookup table for these tags I would use those.
EDIT:
Since you need all tags in the result....
Unfortunately, I think no matter what you do, the SP will be in jeopardy of the plan being regenerated.
You could use optional parameters and use CASE and ISNULL to build up the arguments.
I still think this means your SP has lost most of it's cached goodness but it's better than straight exec 'string' I believe.
So this was easier than I expected. After implementing a rather simple query to take care of this, I instantly had far better performance than I thought I would. So I'm not sure it's necessary to implement and test the other solutions.
I currently have my database filled with around 200 shaders and 500 tags. I ran what I think is a somewhat realistic test where I performed 35 different search queries against my stored proc with a varying number of tags, with and without a search term. I put all of this in a single SQL statement and then I benchmarked the results in ASP.NET. It consistently ran these 35 searches in under 200 milliseconds. If I reduced it to just 5 searches then the time goes down to 10 ms. That kind of performance is awesome. It helps that my database size is small. But I think it also helps that the query utilizes indexes well.
One thing I changed in my query was the way I was looking up tags. I'm now looking up the tags by their id instead of the name. By doing this I can get away with doing 1 less join, and have the benefit of using an index for the search. And then I also added "dbo." to the front of the table names after learning that SQL caches queries on a per-user basis.
In case anyone is interested, here's my finished stored proc:
Even though I didn't exhaust every option, this was still a good exercise because I have proven to myself that my database design is working very well for this task. And I also learned a lot from posting this question. I knew exec() was bad because it doesn't cache the query plan. But I didn't know that sp_executesql caches query plans, and that's very cool. I also didn't know about Common Table Expressions. And the link Henrik Opel posted is packed full of good tips for this type of task.
Of course I still may revisit this a year from now if the database grows drastically. Until then, thanks everyone for the help.
UPDATE:
So I have a working example of this search engine online at http://www.silverlightxap.com/controls if anyone is interested in seeing this in action.
This may not be the fastest method but could you just generate a query string for each tag and then join them with " INTERSECT "?
Edit: Didn't see the sproc tag so I don't know if this would be possible.
Your query is perfect for using a Common Table Expression (CTE) because of the duplicated correlated subquery in the EXISTS clauses:
By using the CTE, I also converted the EXISTS into JOINs.
Speaking to your original question regarding the use of dynamic SQL - the only alternative is to check the incoming parameter for an escape criteria before applying it. IE:
If @param1 contains a NULL value, the later portion of the SQL in brackets is not executed. I prefer the dynamic SQL approach for sake that otherwise you're making JOINs/etc that might not be used - that's a waste of resources.
What performance problems do you believe to exist with dynamic SQL? Using
sp_executesql
does cache the query plan. Frankly I find it odd that a query plan wouldn't be cached if the query is validated for syntax/etc (usingexec
orsp_executesql
) - the validation would happen prior to the query plan, why the step afterwards be skipped?For an extensive overview concerning this and similar problems see: http://www.sommarskog.se/dyn-search-2005.html
Specific to your question is the part here: http://www.sommarskog.se/dyn-search-2005.html#AND_ISNOTNULL
Also take into account that a (straight) dynamic Solution is not necessarily slower than a (possibly convoluted) static one, as query plans can still get cached: see http://www.sommarskog.se/dyn-search-2005.html#dynsql
So you'll have to carefully test/measure your options against realistic amounts of data, taking into account realistic queries (e.g. searches with one or two parameters might be way more common than searches with ten, etc.)
EDIT: Questioner gave a good reason to optimize this in the comments, hence moving the 'premature' warning a bit out of the way:
The (standard ;) word of warning applies, though: This smells a lot like premature optimization! - Are you sure this sproc will get called that often that using dynamic SQL will be significantly slower (that is, compared to other stuff going on in your app)?