Efficient MySQL table structure for rating system

2019-03-31 11:05发布

问题:

This post is a follow-up of this answered question: Best method for storing a list of user IDs.

I took cletus and Mehrdad Afshari's epic advice of using a normalized database approach. Are the following tables properly set up for proper optimization? I'm kind of new to MySQL efficiency, so I want to make sure this is effective.

Also, when it comes to finding the average rating for a game and the total number of votes should I use the following two queries, respectively?

SELECT avg(vote) FROM votes WHERE uid = $uid AND gid = $gid;    
SELECT count(uid) FROM votes WHERE uid = $uid AND gid = $gid;

CREATE TABLE IF NOT EXISTS `games` (
  `id` int(8) NOT NULL auto_increment,
  `title` varchar(50) NOT NULL,
  PRIMARY KEY  (`id`)
) AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `users` (
  `id` int(8) NOT NULL auto_increment,
  `username` varchar(20) NOT NULL,
  PRIMARY KEY  (`id`)
) AUTO_INCREMENT=1 ;


CREATE TABLE IF NOT EXISTS `votes` (
  `uid` int(8) NOT NULL,
  `gid` int(8) NOT NULL,
  `vote` int(1) NOT NULL,
  KEY `uid` (`uid`,`gid`)
) ;

回答1:

average votes for a game: SELECT avg(vote) FROM votes WHERE gid = $gid;

number of votes for a game: SELECT count(uid) FROM votes WHERE gid = $gid;

As you will not have any user or game ids smaller then 0 you could make them unsigned integers (int(8) unsigned NOT NULL).

If you want to enforce that a user can only make a single vote for a game, then create a primary key over uid and gid in the votes table instead of just a normal index.

CREATE TABLE IF NOT EXISTS `votes` (
  `uid` int(8) unsigned NOT NULL,
  `gid` int(8) unsigned NOT NULL,
  `vote` int(1) NOT NULL,
  PRIMARY KEY (`gid`, `uid`)
) ;

The order of the primary key's fields (first gid, then uid) is important so the index is sorted by gid first. That makes the index especially useful for selects with a given gid. If you want to select all the votes a given user has made then add another index with just uid.

I would recommend InnoDB for storage engine because especially in high load settings the table locks will kill your performance. For read performance you can implement a caching system using APC, Memcached or others.



回答2:

Looks good.

I would have used users_id & games_id instead of gid and uid which sounds like global id and unique id



回答3:

Whatever you end up doing, make sure you test it with a large data-set (even if you don't plan on having a huge number of users)

Write a script that generates 100,000 games, 50,000 users and a million votes. May be slightly excessive, but if your queries don't take hours with that number of items, it'll never be an issue



回答4:

Looks good so far. Don't forget indices and foreign keys. In my experience most issues don't arise from not-so-well-thought-out designs but from the lack of indices and foreign keys.

Also, regarding the storage engine selection I have yet to see a reason (in a reasonably complex/sized app) for not using innodb, not just because of transactional semantics.



回答5:

you might want to add a voted_on (DATETIME) column too. That way, you could, say, see a game's trend in a certain timespan, or just in case someday a vote spam happened, you could delete unwanted votes accurately.