Will a compound index with a second column of low

2019-09-18 14:41发布

问题:

Setup

I'm using Single Table Inheritance (STI) in Rails simplified to the following:

class Vehicle
  belongs_to :user
end

class Car < Vehicle
end

class Plane < Vehicle
end

Each record in the vehicles table will have a type column set to either 'Car' or 'Plane' in addition to the user_id foreign key. It could also have additional values if more vehicle types are added, however, type will always have a much lower cardinality than user_id. Just as in real life, I expect this table to contain many more Cars.

There is a compound index on [:user_id, :type] (in that order) and these records are looked up by their subclasses.

Question(s)

I believe that in the worst case of no Planes, the index will be used since user_id is first and the second part will essentially be ignored. In this case a single index would have a super small benefit in that it's not maintaining the compound second column.

What happens in the case where there's an equal split?

  • Will the index cut the records in half and thus have a decent effect?
  • Will the cost of the database maintaining a compound index over a single one (i.e. just user_id) exceed or negate any savings?

Example SQL

An example ActiveRecord call would be Car.where(user_id: 10) which generates the following SQL:

SELECT `vehicles`.* FROM `vehicles` WHERE `vehicles`.`type` IN ('Car')
  AND `vehicles`.`user_id` = 10

回答1:

The cost of maintaining an index (single-column or multi-column) is almost always outweighed by the performance improvement when that index is used. It is a small increment on each INSERT/DELETE, plus a cost if changing the value of an indexed field via UPDATE. (The UPDATE case is rare.) So, don't worry about the cost of "maintaining a compound index".

WHERE `vehicles`.`type` IN ('Car')
  AND `vehicles`.`user_id` = 10

needs INDEX(user_id, type).

The optimizer will

  1. discover that that index is a possible candidate,
  2. check some statistics, then
  3. either use the index, or decide the cardinality is poor and simply scan the table.

Include the index; don't worry about it.

I ordered the fields that way, not (type, user_id) based on your IN, which implies that you might sometimes have multiple values for type.

If all rows in the table have type = 'Car', no problem. Everything I have said still applies. The waste of including the unnecessary type is insignificant.

It is better to have all "=" column(s) first in an index, then at most one other field. Further discussion here.