SQL: Help me optimize my SQL

2019-08-07 14:41发布

I'm looking to optimize my SQL.

My database schema is:

HOMES

  • home_id
  • address
  • city
  • state
  • zip
  • primary_photo_group_id

HOME_PHOTOS

  • photo_id (primary key)
  • home_id (home primary key)
  • photo_group_id (a photo group is the same image, resize from thumbnail to large size)
  • home_photo_type_id (the size of the image be it a thumbnail or a large size)
  • photo_url_dir (the filesystem location where the photo is stored)

Problem

It's very possible that a 'home' does not have a photo associated with the home. In that case, the primary_photo_group_id = 0. Otherwise,primary_photo_group_id` equals the group_id of the photo to use as the primary photo.

Slow SQL (b/c of UNION)

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id,
    photo_url_dir
FROM homes, home_photos
WHERE homes.home_id = home_photos.home_id
AND primary_photo_group_id = home_photo_group_id
AND home_photo_type_id = 2

UNION

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    null,
    null
FROM homes
WHERE primary_photo_group_id = 0

What I would like to do

I would like to get rid of the UNION since I'm having to search back over the entire table 2x. How can I get rid of the UNION since I need to check for the case where primary_photo_group_id = 0 and if it's not equal to 0, then query the home_photos table

Here is the pseudo code that needs to happen

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id,  (include only if primary_photo_group_id != 0)
    photo_url_dir  (include only if primary_photo_group_id != 0)
FROM homes,
    home_photos (include only if primary_photo_group_id != 0)
WHERE 
    primary_photo_group_id = 0
ELSE
    homes.home_id = home_photos.home_id
        AND primary_photo_group_id = home_photo_group_id
        AND home_photo_type_id = 2

4条回答
别忘想泡老子
2楼-- · 2019-08-07 15:01

Read up about [Joins](http://en.wikipedia.org/wiki/Join_(SQL)

 Select * from table_a, table_b

is a cross join - which you limit into an inner join with the where clauses. Turn your queries into inner joins, then read up about outer joins.

Edit: I didn't just give you the answer, because I thought it was homework, but I still won't just give you the answer because outer joins are important enough to know about even if you're just writing SQL for a website. you'll be a better person for learning it, and using this as an example.

查看更多
狗以群分
3楼-- · 2019-08-07 15:06

If the second query is slower than the first, it's because you have an index on {home_id, primary_photo_group_id} (or perhaps just {home_id}), but not on {primary_photo_group_id} alone. You need an index on this column if you want to improve the performance of looking up on that column.

查看更多
贼婆χ
4楼-- · 2019-08-07 15:14

Maybe you do not know about left outer join? Try:

SELECT  homes.home_id,
    address,
    city,
    state,
    zip,
    photo_id
    photo_url_dir 
FROM homes h
left outer join home_photos hp on h.home_id = hp.home_id
    AND primary_photo_group_id = home_photo_group_id
    AND home_photo_type_id = 2
查看更多
仙女界的扛把子
5楼-- · 2019-08-07 15:14
SELECT  homes.home_id,
        address,
        city,
        state,
        zip,
        photo_id,
        photo_url_dir
FROM    homes
LEFT JOIN
        home_photos
ON      home_photos.home_id = homes.home_id
        AND home_photo_group_id = CASE WHEN primary_photo_group_id = 0 THEN NULL ELSE primary_photo_group_id END
        AND home_photo_type_id = 2

Having a composite index on home_photos (home_id, home_photo_group_id, home_photo_type_id) will greatly improve this query.

Note that using CASE is slightly more efficient than left joining on 0, even if there are no records with home_photo_group_id = 0 in home_photos.

When MySQL sees a JOIN on NULL (which can yield nothing by definition), it won't even look into the joined table. When it joins on 0, it still has to check the index and make sure no value exists.

This is not very much of a performance impact, but still can improve your query time by several percents, especially if you have a lot of 0's in homes.

See this entry in my blog for performance detail:

Also note that your tables are not in 2NF.

Your group_id depends on home_id, and including it into home_photos is 2NF violation.

It's not always bad, but it may be harder to manage.

查看更多
登录 后发表回答