I'm at a loss. I have a table with about 100K rows. When querying this table results are usually snappy, about 2ms or so. But whenever I use an ORDER BY performance drops like a rock to about 120ms. I read the MySQL ORDER BY Optimization page but I can't say I understand everything. Especially the indexes are unclear to me.
Ultimately I would like to run the following query:
SELECT *
FROM `affiliate_new_contracts`
WHERE phone_brand IN ('Apple','Blackberry','HTC','LG','Motorola','Nokia',
'Samsung','Sony Ericsson')
AND contract_length IN ('12','24')
AND (addon IS NULL OR addon IN('Telfort Sms 300','Surf & Mail'))
AND (plan_name = 'Telfort 100'
AND
credible_shop = 1
)
ORDER BY average_price_per_month ASC, phone_price_guestimate DESC,
contract_length ASC;
But I would be happy if I understood the underlying principles.
Removing the ORDER BY clause in the previous query makes it run in 20ms in stead of 120ms. I have an index on the average_price_per_month
field but simplifying the ORDER BY clause to ORDER BY average_price_per_month
yielded no performance increase. That I don't understand. I'm also in the dark about the so called multi column indexes which should be able to help me with the ultimate query.
Any help would be appreciated. How do I make this bad boy perform? Or is that quest utopian?
The CREATE TABLE
syntax is as follows:
$ show create table affiliate_new_contracts;
CREATE TABLE `affiliate_new_contracts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`plan_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`contract_length` int(11) DEFAULT NULL,
`phone_brand` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`price` float DEFAULT NULL,
`average_price_per_month` float DEFAULT NULL,
`phone_price_guestimate` float DEFAULT NULL,
`credible_shop` tinyint(1) DEFAULT '0',
`addon` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`addon_price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_affiliate_new_contracts_on_plan_name` (`plan_name`),
KEY `index_affiliate_new_contracts_on_average_price_per_month` (`average_price_per_month`),
KEY `index_affiliate_new_contracts_on_price` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=2472311 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
BTW This table is recreated weekly and is not updated in the meanwhile.
There is a limit to how much optimization you can do on ORDER BY clauses. The primary one that sometimes helps is having an index on the correct set of columns in the correct order. So, for your example, a (single, composite) index on:
average_price_per_month ASC, phone_price_guestimate DESC, contract_length ASC
might help, but the optimizer might still decide that it is better to use some other index to deal with the filter terms in the query and then it will sort the data thus selected itself. Note that unless the index provides the data in exactly the correct sorted order and using the index speeds up the query overall, then the optimizer won't use it. An index on only one of the columns to be sorted is a limited benefit to the optimizer, and it normally won't use such an index.
One question to consider:
- How fast does the query perform without the ORDER BY clause.
That gives you a very direct measurement of the cost of sorting. You mention 20 ms without ordering and 120 ms with ordering, so the ORDER BY is moderately expensive. The next question might be "Can you outperform its sort in your application?". You might be able to do that, but the sort package in a DBMS is usually fairly well optimized and you're likely to have to work hard to beat it.
I suspect that your index isn't doing you any good because it's not the primary key and your query selection logic (where clause) doesn't use it. Because you're not using the index to choose which rows, you end up having to sort the results after selection. The fact that it's not your primary key means that the results are not already ordered by the average price per month, which would reduce or eliminate the sort time since they would already be ordered.
One solution would be to use a compound index that includes the most selective column (plan name) and the ordering column (average_price_per_month). It will still need to do the sort after the selection, but the results will already be ordered by the primary ordering column reducing the time spent.
CREATE TABLE `affiliate_new_contracts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`plan_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`contract_length` int(11) DEFAULT NULL,
`phone_brand` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`price` float DEFAULT NULL,
`average_price_per_month` float DEFAULT NULL,
`phone_price_guestimate` float DEFAULT NULL,
`credible_shop` tinyint(1) DEFAULT '0',
`addon` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`addon_price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_affiliate_new_contracts_on_plan_name` (`plan_name`,`average_price_per_month`),
KEY `index_affiliate_new_contracts_on_price` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=2472311 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
You might also want to use EXPLAIN to understand how the query is being performed (if my intuition isn't correct) and adjust the indices accordingly.