Select top N rows out of M groups

2020-07-22 17:38发布

问题:

I have this table:

CREATE TABLE IF NOT EXISTS `catalog_sites` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `cat_id` int(10) unsigned NOT NULL,
  `date` datetime NOT NULL,
  `url` varchar(255) NOT NULL,
  `title` varchar(255) NOT NULL,
  `description` varchar(255) NOT NULL,
  `keywords` varchar(255) NOT NULL,
  `visited` int(10) unsigned NOT NULL,
  `shown` int(10) unsigned NOT NULL,
  `meta_try` int(1) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `url` (`url`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

I think my problem is simple, but cant seem to find an appropriate solution..

So, this is a table with web-sites, I would like to get 6 sites in 6 different categories (cat_id, total: 36 rows) with the highest rating for each category. The rating is calculated as visited / shown.

I should get 36 rows containing 6 top categories (we can find them by sorting with AVG(visited / shown) ), and 6 top sites in each of these 6 categories.

If you have any ideas how this might happen differently, please tell me.

回答1:

This should get you what you want using MySQL Variables, the inner query will pre-calculate the rank of visited / shown, and using an order by by the condition you want... Per Category, the highest ranks... and then using @vars will keep the @RankSeq sequentially 1-? per category. From that Prequery (aliased PQ), the OUTER query just queries the PreQuery where the URL's Rank Sequence is <= 6

To further ensure you are only getting the top 6 categories, the inner PreQuery also has a pre-query / limit for the "TopCategories" (alias)

select
      PQ.URL,
      PQ.Cat_ID,
      PQ.Rank,
      PQ.URLRankSeq
   from 
      ( select 
              CS.cat_id,
              (CS.visited / CS.shown ) as Rank,
              CS.url,
              @RankSeq := if( @LastCat = CS.Cat_ID, @RankSeq +1, 1 ) URLRankSeq,
              @LastCat := CS.Cat_ID as ignoreIt
           from
              ( select cat_id, 
                       avg( visited / shown )
                   from catalog_sites
                   group by 1
                   order by 2 desc
                   limit 6 ) TopCategories

              JOIN catalog_sites CS
                 on TopCategories.Cat_ID = CS.Cat_ID,

              (select @RankSeq := 0, @LastCat = 0 ) SQLVars   
           order by 
              CS.cat_id,
              Rank ) PQ
    where
      PQ.URLRankSeq <= 6


回答2:

I've tried your example, but it doesn't really work for me, or I just don't know how to adapt it to my case. Anyway, I'm still a noob as far as SQL goes, so I couldn't understand your query.

I have managed to solve my problem however. It's complicated and probably the worst possible approach. It is slow too, but I'll cache the results, so that shouldn't be a problem.

Here is my solution:

SET @site_limit = 2;
SET @cat_limit = 6;

SET @row = 0;
SET @limiter = 0;
SET @last_cat = 0;

SELECT `cat_id`, `url`, `visited` / `shown` AS `rating`, @limiter := IF(@last_cat = `cat_id`, IF(@limiter >= @site_limit - 1, @limiter, @limiter + 1), 0) AS `limiter`, @last_cat := `cat_id` AS `last_cat`
FROM `catalog_sites`
WHERE `cat_id`
IN (
    SELECT `cat_id`
    FROM (
        SELECT `cat_id`, @row := @row + 1 AS `row`
        FROM (
            SELECT `cat_id`
            FROM `catalog_sites`
            GROUP BY `cat_id`
            ORDER BY AVG(`visited` / `shown`) DESC
        ) AS derived1
    ) AS derived2
    WHERE `row` <= @cat_limit
)
GROUP BY `cat_id`, `limiter`
ORDER BY `cat_id`, `rating` DESC