I have a StackOverflow-like tagging system for a database I'm working on. And I'm writing a stored procedure that looks for results based on an undetermined number of tags in a WHERE clause. There could be anywhere between 0 and 10 tags to filter results. So for example the user could be searching for items tagged with 'apple', 'orange', and 'banana' and each result must include all 3 tags. My query is made even more complicated because I'm also dealing with a cross reference table for the tagging, but for the purposes of this question I won't go into that.
I know I can do some string manipulation and feed the exec() function a query to take care of this but I'd rather not for performance problems associated with dynamic SQL. I figure it's best if SQL caches a query plan for the stored proc.
What are some techniques you've used to avoid dynamic SQL in this type of scenario?
By popular demand, here's the query I'm working with:
SELECT ft.[RANK], s.shader_id, s.page_name, s.name, s.description, s.download_count, s.rating, s.price FROM shader s
INNER JOIN FREETEXTTABLE(shader, *, @search_term) AS ft ON s.shader_id = ft.[KEY]
WHERE EXISTS(SELECT tsx.shader_id FROM tag_shader_xref tsx INNER JOIN tag t ON tsx.tag_id = t.tag_id WHERE tsx.shader_id = s.shader_id AND t.tag_name = 'color')
AND EXISTS(SELECT tsx.shader_id FROM tag_shader_xref tsx INNER JOIN tag t ON tsx.tag_id = t.tag_id WHERE tsx.shader_id = s.shader_id AND t.tag_name = 'saturation')
ORDER BY ft.[RANK] DESC
This is functional but hard-coded. You'll see that I have it set to look for the 'color' and 'saturation' tags.
You may dynamically generate the appropriate parameterized (prepared) SQL templates instead.
Build and prepare the statement template when the parameters present themselves for the first time, caching the prepared statements for re-use when the same number of parameters appears again.
This could be done in the application or a sufficiently sophisticated stored procedure.
I much prefer this approach to, say, a procedure that takes at most 10 tags and has grody logic to deal with any of them being NULL.
Bill Karwin's
GROUP BY
answer in this question is probably the easiest template to construct -- you're simply concatenating placeholders for theIN
predicate and updating theCOUNT
clause. Other solutions involving joins-per-tag would require incrementing table aliases (e.g.,xref1
,xref2
, and so on) as you go.I've seen two types of solutions to this problem:
The first is to join the
shader
table totags
(via the xref as needed) once for each tag that you're looking for. The result of the inner join includes only shaders that have a match for all tags.The second solution is to join to that tags once, restricting tags to the three you need, and then
GROUP BY
theshader_id
so you can count the matches. The count will be three only if all tags were found (assuming uniqueness in the xref table).Which should you use? Depends on how well your brand of database optimizes one method or the other. I usually use MySQL, which doesn't do as well with
GROUP BY
, so it's better to use the former method. In Microsoft SQL Server, the latter solution might do better.