I'm exploring ways of improving the performance of an application which I can only affect on the database level to a limited degree. The SQL Server version is 2012 SP2 and the table and view structure in question is (I cannot really affect this + note that the xml document may have several hundred elements in total):
CREATE TABLE Orders(
id nvarchar(64) NOT NULL,
xmldoc xml NULL,
CONSTRAINT PK_Order_id PRIMARY KEY CLUSTERED (id)
);
CREATE VIEW V_Orders as
SELECT
a.id, a.xmldoc
,a.xmldoc.value('data(/row/c1)[1]', 'nvarchar(max)') "Stuff"
,a.xmldoc.value('data(/row/c2)[1]', 'nvarchar(max)') "OrderType"
etc..... many columns
from Orders a;
A typical query (and the one being used for testing below):
SELECT id FROM V_Orders WHERE OrderType = '30791'
All the queries are performed against the view and I can affect neither the queries nor the table/view structure.
I thought adding a selective XML index to the table would be my saviour:
CREATE SELECTIVE XML INDEX I_Orders_OrderType ON Orders(xmldoc)
FOR(
pathOrderType = '/row/c2' as SQL [nvarchar](20)
)
But even after updating the statistics the execution plan is looking weird. Couldn't post a pic as new account so the relevant details as text:
- Clustered index seek from selectiveXml (Cost: 2% of total). Expected number of rows 1 but expected number of execution times 1269 (number of rows in the table)
- -> Top N sort (Cost: 95% of total)
-> Compute scalar (Cost 0)
Separate branch: Clustered index scan PK_Order_id (Cost: 3% of total). Expected number of rows 1269
- -> Merged to the Computer scalar results with Nested loops (Left outer join)
- -> Filter
- -> Final result (Expected number of rows 1269)
In actuality with my test data the query doesn't even return any results but whether it returns one or few doesn't make any difference. Execution times support the query really taking as long as could be deduced from the execution plan and have read counts in the thousands.
So my question is why is the selective xml index not being used properly by the optimizer? Or have I got something wrong? How would I optimize this specific query's performance with selective xml indexing (or perhaps persisted column)?
Edit: I did additional testing with larger sample data (~274k rows in the table with XML documents close to average production sizes) and compared the selective XML index to a promoted column. The results are from Profiler trace, concentrating on CPU usage and read counts. The execution plan for selective xml indexing is basically identical to what is described above.
Selective XML index and 274k rows (executing the query above): CPU: 6454, reads: 938521
After I updated the values in the searched field to be unique (total records still 274k) I got the following results:
Selective XML index and 274k rows (executing the query above): CPU: 10077, reads: 1006466
Then using a promoted (i.e. persisted) separately indexed column and using it directly in the view: CPU: 0, reads: 23
Selective XML index performance seems to be closer to full table scan than proper SQL indexed column fetch. I read somewhere that using schema for the table might help drop the TOP N step from execution plan (assuming we're searching for a non-repeating field) but I'm not sure whether that's a realistic possibility in this case.