- When should I use a composite index in a database?
- What are the performance ramification by using a composite index)?
- Why should I use use a composite index?
For example, I have a homes
table:
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
PRIMARY KEY (`home_id`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
) ENGINE=InnoDB ;
Does it make sense for me to use a composite index for both geolat
and geolng
, such that:
I replace:
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
with:
KEY `geolat_geolng` (`geolat`, `geolng`)
If so:
- Why?
- What is the performance ramification by using a composite index)?
UPDATE:
Since many people have stated it entirely dependent upon the queries I perform, below is the most common query performed:
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
UPDATE 2:
With the following database schema:
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`primary_photo_group_id` int(10) unsigned NOT NULL default '0',
`customer_id` bigint(20) unsigned NOT NULL,
`account_type_id` int(11) NOT NULL,
`address` varchar(128) collate utf8_unicode_ci NOT NULL,
`city` varchar(64) collate utf8_unicode_ci NOT NULL,
`state` varchar(2) collate utf8_unicode_ci NOT NULL,
`zip` mediumint(8) unsigned NOT NULL,
`price` mediumint(8) unsigned NOT NULL,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`num_of_beds` tinyint(3) unsigned NOT NULL,
`num_of_baths` decimal(3,1) unsigned NOT NULL,
`num_of_floors` tinyint(3) unsigned NOT NULL,
`description` text collate utf8_unicode_ci,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
`display_status` tinyint(1) NOT NULL,
`date_listed` timestamp NOT NULL default CURRENT_TIMESTAMP,
`contact_email` varchar(100) collate utf8_unicode_ci NOT NULL,
`contact_phone_number` varchar(15) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`home_id`),
KEY `customer_id` (`customer_id`),
KEY `city` (`city`),
KEY `num_of_beds` (`num_of_beds`),
KEY `num_of_baths` (`num_of_baths`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
KEY `account_type_id` (`account_type_id`),
KEY `display_status` (`display_status`),
KEY `sqft` (`sqft`),
KEY `price` (`price`),
KEY `primary_photo_group_id` (`primary_photo_group_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8 ;
Using the following SQL:
EXPLAIN SELECT homes.home_id,
address,
city,
state,
zip,
price,
sqft,
year_built,
account_type_id,
num_of_beds,
num_of_baths,
geolat,
geolng,
photo_id,
photo_url_dir
FROM homes
LEFT OUTER JOIN home_photos ON homes.home_id = home_photos.home_id
AND homes.primary_photo_group_id = home_photos.home_photo_group_id
AND home_photos.home_photo_type_id = 2
WHERE homes.display_status = true
AND homes.geolat BETWEEN -100 AND 100
AND homes.geolng BETWEEN -100 AND 100
EXPLAIN returns:
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------------
1 SIMPLE homes ref geolat,geolng,display_status display_status 1 const 2 Using where
1 SIMPLE home_photos ref home_id,home_photo_type_id,home_photo_group_id home_photo_group_id 4 homes.primary_photo_group_id 4
I don't quite understand how to read the EXPLAIN command. Does this look good or bad. Right now, I am NOT using a composite index for geolat and geolng. Should I be?
Composite index can be useful when you want to optimise
group by
clause (check this article http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html). Please pay attention:Imagine you have the following three queries:
Query I:
Query II:
Query III:
If you have seperate index per column, all three queries use indexes. In MySQL, if you have composite index (
geolat
,geolng
), only query I and query II (which is using the first part of the composit index) uses indexes. In this case, query III requires full table search.On Multiple-Column Indexes section of manual, it is clearly explained how multiple column indexes work, so I don't want to retype manual.
From the MySQL Reference Manual page:
If you use seperated index for geolat and geolng columns, you have two different index in your table which you can search independent.
If you use composite index you have only one index for both columns:
RRN is relative record number (to simplify, you can say ID). The first two index generated seperate and the third index is composite. As you can see you can search based on geolng on composite one since it is indexed by geolat, however it's possible to search by geolat or "geolat AND geolng" (since geolng is second level index).
Also, have a look at How MySQL Uses Indexes manual section.
Composite indexes are useful for
A composite index cannot handle two ranges. I discuss this further in my index cookbook.
Find nearest -- If the question is really about optimizing
then no index can really handle both dimensions.
Instead, one must 'think out of the box'. If one dimension is implemented via partitioning and the other is implemented by carefully picking the
PRIMARY KEY
, one can get significantly better efficiency for very large tables of lat/lng lookup. My latlng blog goes into the details of how to implement "find nearest" on the globe. It includes code.The
PARTITIONs
are stripes of latitude ranges. ThePRIMARY KEY
deliberately starts with longitude so that the useful rows are likely to be in the same block. A Stored Routine orchestrates the messy code for doingorder by... limit...
and for growing the 'square' around the target until you have enough coffee shops (or whatever). It also takes care of the great-circle calculations and handling the dateline and poles.Composite indexes are very powerful as they:
ENFORCE STRUCTURE INTEGRITY
Composite indexes are not just another type of index; they can provide NECESSARY structure to a table by enforcing integrity as the Primary Key.
Mysql's Innodb supports clustering and the following example illustrates why a composite index may be necessary.
To create a friends' tables (i.e. for a social network) we need 2 columns:
user_id, friend_id
.Table Strcture
By virtue, a Primary Key (PK) is unique and by creating a composite PK, Innodb will automatically check that no duplicates on
user_id, friend_id
exists when a new record is added. This is the expected behavior as no user should have more than 1 record (relationship link) withfriend_id = 2
for instance.Without a composite PK, we can create this schema using a surrogate key:
Now, whenever a new record is added we will have to check that a prior record with the combination
user_id, friend_id
does not already exist.As such, a composite index can enforce structure integrity.
ENABLE SORTING ON A FILTERED ID
It is very common to sort a set of records by the post's time (timestamp or datetime). Usually, this means posting on a given id. Here is an example
Table User_Wall_Posts (think if Facebook's wall posts)
We want to query and find all posts for
user_id = 10
and sort the comment posts bytimestamp
(date).SQL QUERY
The composite PK enables Mysql to filter and sort the results using the index; Mysql will not have to use a temporary file or filesort to fetch the results. Without a composite key, this would not be possible and would cause a very inefficient query.
As such, composite keys are very powerful and suit more than the simple problem of "I want to search for
column_a, column_b
so I will use composite keys. For my current database schema, I have just as many composite keys as single keys. Don't overlook a composite key's use!You should use a composite index when you are using queries that benefit from it. A composite index that looks like this:
will benefit a query that uses those fields for joining, filtering, and sometimes selecting. It will also benefit queries that use left-most subsets of columns in that composite. So the above index will also satisfy queries that need
But it will not (at least not directly, maybe it can help partially if there are no better indices) help for queries that need
Notice how column_B is missing.
In your original example, a composite index for two dimensions will mostly benefit queries that query on both dimensions or the leftmost dimension by itself, but not the rightmost dimension by itself. If you're always querying two dimensions, a composite index is the way to go, doesn't really matter which is first (most probably).
To do spacial searches, you need an R-Tree algorithm, which allows searching geographical areas very quickly. Exactly what you need for this job.
Some databases have spacial indexes built in. A quick Google search shows MySQL 5 has them (which looking at your SQL I'm guessing you're using MySQL).