The application I work on generates an SQL query somewhat like this:
Select
VISIT_VIEW.VISIT_ID, VISIT_VIEW.PATIENT_ID, VISIT_VIEW.MRN_ID,
VISIT_VIEW.BILL_NO, INSURANCE.INS_PAYOR'
FROM
'VISIT_VIEW
LEFT JOIN
INSURANCE ON VISIT_VIEW.visit_id = INSURANCE._fk_visit '
WHERE
'VISIT_VIEW.VISIT_ID IN (1002, 1003, 1005, 1006, 1007, 1008, 1010, 1011, <...>, 1193, 1194, 1195, 1196, 1197, 1198, 1199)'
The <...>
represents a long list of ids. The size of the list depends on the results of a previous query, and in turn on the parameters selected to generate that query.
The list of IDs can be anywhere from 100 items long to above 2000.
The INSURANCE
table is large, over 9 million rows. The visit table is also large, but not quite as large.
As the number of IDs goes up there is a fairly sharp increase from a duration of less than a second to over 15 minutes. The increase starts somewhere around 175 ids.
If the parameters used to generate the query are changed so that the INS_PAYOR
column is not selected, and thus there is no left join, the query runs in less than a second, even with over 2000 items in the list of IDs.
The execution plan shows that 97% of the query time is devoted to a clustered seek on the INSURANCE
table.
How can I rework this query to get the same results with a less horrific delay?
Do remember that the SQL is being generated by code, not by hand. It is generated from a list of fields (with knowledge of which field belongs to which table) and a list of IDs in the primary table to check. I do have access to the code that does the query generation, and can change it provided that the ultimate results of the query are exactly the same.
Thank you