I'm looking to optimize my SQL.
My database schema is:
HOMES
- home_id
- address
- city
- state
- zip
- primary_photo_group_id
HOME_PHOTOS
- photo_id (primary key)
- home_id (home primary key)
- photo_group_id (a photo group is the same image, resize from thumbnail to large size)
- home_photo_type_id (the size of the image be it a thumbnail or a large size)
- photo_url_dir (the filesystem location where the photo is stored)
Problem
It's very possible that a 'home' does not have a photo associated with the home. In that case, the primary_photo_group_id
= 0. Otherwise,
primary_photo_group_id` equals the group_id of the photo to use as the primary photo.
Slow SQL (b/c of UNION)
SELECT homes.home_id,
address,
city,
state,
zip,
photo_id,
photo_url_dir
FROM homes, home_photos
WHERE homes.home_id = home_photos.home_id
AND primary_photo_group_id = home_photo_group_id
AND home_photo_type_id = 2
UNION
SELECT homes.home_id,
address,
city,
state,
zip,
null,
null
FROM homes
WHERE primary_photo_group_id = 0
What I would like to do
I would like to get rid of the UNION since I'm having to search back over the entire table 2x. How can I get rid of the UNION since I need to check for the case where primary_photo_group_id = 0 and if it's not equal to 0, then query the home_photos
table
Here is the pseudo code that needs to happen
SELECT homes.home_id,
address,
city,
state,
zip,
photo_id, (include only if primary_photo_group_id != 0)
photo_url_dir (include only if primary_photo_group_id != 0)
FROM homes,
home_photos (include only if primary_photo_group_id != 0)
WHERE
primary_photo_group_id = 0
ELSE
homes.home_id = home_photos.home_id
AND primary_photo_group_id = home_photo_group_id
AND home_photo_type_id = 2
Read up about [Joins](http://en.wikipedia.org/wiki/Join_(SQL)
is a cross join - which you limit into an inner join with the where clauses. Turn your queries into inner joins, then read up about outer joins.
Edit: I didn't just give you the answer, because I thought it was homework, but I still won't just give you the answer because outer joins are important enough to know about even if you're just writing SQL for a website. you'll be a better person for learning it, and using this as an example.
If the second query is slower than the first, it's because you have an index on
{home_id, primary_photo_group_id}
(or perhaps just{home_id}
), but not on{primary_photo_group_id}
alone. You need an index on this column if you want to improve the performance of looking up on that column.Maybe you do not know about left outer join? Try:
Having a composite index on
home_photos (home_id, home_photo_group_id, home_photo_type_id)
will greatly improve this query.Note that using
CASE
is slightly more efficient than left joining on0
, even if there are no records withhome_photo_group_id = 0
inhome_photos
.When
MySQL
sees aJOIN
onNULL
(which can yield nothing by definition), it won't even look into the joined table. When it joins on0
, it still has to check the index and make sure no value exists.This is not very much of a performance impact, but still can improve your query time by several percents, especially if you have a lot of
0
's inhomes
.See this entry in my blog for performance detail:
Also note that your tables are not in
2NF
.Your
group_id
depends onhome_id
, and including it intohome_photos
is2NF
violation.It's not always bad, but it may be harder to manage.