Range queries on 2 columns

2019-08-13 10:17发布

问题:

I have very huge table Shelve(approximately 100 millions) which has Shelve info for books.

Shelve

ShevleID     RangeStart      RangeEnd  
----------------------------------------
   1               1           100
   2             200           500
   3             501           1000

Each book has unique number BookID given to it. Lets say you have a book with BookID 50. Then Book must be kept in Shelve 1 because 50 lies between 1 and 100.

Books

BookID     BookName
---------------------------
   1       Book1
   2       Book2
   .
   .
  50       Book3

My queries are like this-

SELECT 
    BookID, 
    BookName, 
    ShelveID
FROM 
    Book B
LEFT JOIN  
    Shelve S 
      ON B.BookID 
                BETWEEN 
                       S.RenageStart 
                       AND
                       S.RangeEND

This query is super slow because query is able to use index only one of the columns either RangeStart or RangeEnd at a time.

I have already tried these 5 options-

  1. Create Index on StartIP

  2. Create Index on EndIP

  3. Create included Index on StartIP (included column EndIP)

  4. Create included Index on EndIP (included column StartIP)

  5. Create Index on StartIP,EndIP

Can someone please suggest me some approach to achieve this?

回答1:

If you want one shelve value for each book, you can try:

SELECT b.*,
       (SELECT TOP 1 s.ShelveId
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) as ShelveId
FROM Book B;

This should make effective use of an index on Shelve(RangeStart, ShelveId).

This assumes that you want one ShelveId and the book ranges are not overlapping.

I am curious what your real application is. No libraries (to the best of my knowledge) have hundreds of millions of books.

EDIT:

You can handle the missing ShelveId with a case statement:

SELECT b.*,
       (SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end)
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) as ShelveId
FROM Book B;

If the other assumptions are true, this may solve your problem.

EDIT II:

If you want other attributes, then try cross apply. It should have similar performance characteristics:

SELECT b.*,
       s.*
FROM Book B CROSS APPLY
     (SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end) as RangeStart, . . .
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) s

Now, comes a bit of experimentation. I would like to write:

SELECT b.*,
       s.*
FROM Book B CROSS APPLY
     (SELECT TOP 1 s.*
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart and b.BookId <= s.RangeEnd
        ORDER BY s.RangeStart DESC
       ) s

But, this might confuse the optimization engine and prevent the use of the index. If it works, great. If it doesn't work, I would suggest either using the first version with a case for every variable. Or, use the correlated subquery version and join back to the Shelve table on a primary key.