I'm working on the following query but not sure how to proceed with further optimizations:
SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE u.id IN (
SELECT uu.id
FROM users uu
WHERE uu.latitude between lat1 and lat2 -- MySQL 5.7 supports Point data type, but it is not indexed in innoDB. We store latitude and longitude as DOUBLE for now
AND uu.longitude between lon1 and lon2
AND uu.dateOfBirth between maxAge and minAge -- dates are in millis, therefore maxAge will have a smaller value than minAge and so it needs to go first
)
AND IF(gender is null, TRUE, u.gender = gender)
AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
AND u.accountState = 'A'
AND u.id != userId
HAVING distanceInMiles < searchRadius ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;
CREATE INDEX `findMatches` ON `users` (`latitude` ASC, `longitude` ASC, `dateOfBirth` ASC) USING BTREE;
The idea here is to have an inner query that identifies qualifying rows based on user location and age, using the covering index specified above. In a table with a few million rows, that narrows them down to a few thousand without requiring a full table scan. Then the resulting rows are tested against more fine-grain conditions, such as gender, availability, etc. - this time a full scan of the reduced resultset is inevitable.
This runs "almost" as expected, EXPLAIN
reveals that the inner query does indeed make use of the full key length (3 columns) of the covering index, and the outer query then looks up the rows returned, by PK.
The problem:
The performance is satisfactory while the search range is within a few hundred miles, but when I get to a thousand miles, it starts degrading because the number of users within the specified boundary increases. The problem would also become apparent if the search range remained the same, but the number of users increased by a few orders. Here are the problems that I have identified so far:
- MySQL does not currently support
LIMIT
in inner queries, therefore the inner query will return ALL qualifyinguserIDs
(i.e., thousands), even if the outer query will then limit them to just a dozen or so. - Enabling
optimizer_trace
and looking at the work behind the scenes suggests that only columnlatitude
of my covering index is used as arange
. I'm not sure why that is, especially sinceEXPLAIN
suggests that the full index key length is used.
The question:
How do I solve (1) and (2) above? Before somebody suggests using spatial data types for lat and long, please note that the latest InnoDB engine (MySQL v5.7) does not support spatial indexes, just spatial data types.