I have very huge table Shelve(approximately 100 millions) which has Shelve info for books.
Shelve
ShevleID RangeStart RangeEnd
----------------------------------------
1 1 100
2 200 500
3 501 1000
Each book has unique number BookID given to it. Lets say you have a book with BookID 50.
Then Book must be kept in Shelve 1 because 50 lies between 1 and 100.
Books
BookID BookName
---------------------------
1 Book1
2 Book2
.
.
50 Book3
My queries are like this-
SELECT
BookID,
BookName,
ShelveID
FROM
Book B
LEFT JOIN
Shelve S
ON B.BookID
BETWEEN
S.RenageStart
AND
S.RangeEND
This query is super slow because query is able to use index only one of the columns either RangeStart or RangeEnd at a time.
I have already tried these 5 options-
Create Index on StartIP
Create Index on EndIP
Create included Index on StartIP (included column EndIP)
Create included Index on EndIP (included column StartIP)
Create Index on StartIP,EndIP
Can someone please suggest me some approach to achieve this?
If you want one shelve value for each book, you can try:
SELECT b.*,
(SELECT TOP 1 s.ShelveId
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) as ShelveId
FROM Book B;
This should make effective use of an index on Shelve(RangeStart, ShelveId)
.
This assumes that you want one ShelveId
and the book ranges are not overlapping.
I am curious what your real application is. No libraries (to the best of my knowledge) have hundreds of millions of books.
EDIT:
You can handle the missing ShelveId
with a case
statement:
SELECT b.*,
(SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end)
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) as ShelveId
FROM Book B;
If the other assumptions are true, this may solve your problem.
EDIT II:
If you want other attributes, then try cross apply
. It should have similar performance characteristics:
SELECT b.*,
s.*
FROM Book B CROSS APPLY
(SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end) as RangeStart, . . .
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) s
Now, comes a bit of experimentation. I would like to write:
SELECT b.*,
s.*
FROM Book B CROSS APPLY
(SELECT TOP 1 s.*
FROM Shelve S
WHERE b.BookId >= s.RangeStart and b.BookId <= s.RangeEnd
ORDER BY s.RangeStart DESC
) s
But, this might confuse the optimization engine and prevent the use of the index. If it works, great. If it doesn't work, I would suggest either using the first version with a case
for every variable. Or, use the correlated subquery version and join back to the Shelve
table on a primary key.