How to get Ids of 4 distinct values ( combination

2019-03-04 17:37发布

问题:

I got table like below

mysql> select * from tb_dts;
+----+------+------+
| Id | key1 | key2 |
+----+------+------+
|  1 |    1 |    1 |
|  2 |    1 |    1 |
|  3 |    1 |    1 |
|  4 |    2 |    1 |
|  5 |    2 |    1 |
|  6 |    2 |    1 |
|  7 |    2 |    1 |
|  8 |    1 |    2 |
|  9 |    1 |    2 |
| 10 |    1 |    2 |
| 11 |    1 |    2 |
| 12 |    1 |    2 |
| 13 |    3 |    1 |
| 14 |    3 |    1 |
| 15 |    3 |    1 |
| 16 |    3 |    1 |
| 17 |    2 |    2 |
| 18 |    2 |    2 |
| 19 |    2 |    2 |
| 20 |    2 |    3 |
| 21 |    2 |    3 |
| 22 |    2 |    3 |
| 23 |    3 |    2 |
| 24 |    3 |    2 |
| 25 |    3 |    2 |
| 26 |    3 |    2 |
+----+------+------+
26 rows in set (0.00 sec)

I take distinct values like this, used in some application pagination

mysql> select distinct key1,key2 from tb_dts limit 0,4;
+------+------+
| key1 | key2 |
+------+------+
|    1 |    1 |
|    1 |    2 |
|    2 |    1 |
|    2 |    2 |
+------+------+
4 rows in set (0.00 sec)

mysql> select distinct key1,key2 from tb_dts limit 4,4;
+------+------+
| key1 | key2 |
+------+------+
|    2 |    3 |
|    3 |    1 |
|    3 |    2 |
+------+------+
3 rows in set (0.00 sec)

Through group_concat I get Ids as well but, I want to use this Ids in WHERE Field IN clause like where somefield IN ( ..here my Ids goes...)

mysql> select key1,key2,group_concat(Id) from tb_dts group by key1,key2 limit 0,4;
+------+------+------------------+
| key1 | key2 | group_concat(Id) |
+------+------+------------------+
|    1 |    1 | 1,2,3            |
|    1 |    2 | 8,9,10,11,12     |
|    2 |    1 | 4,5,6,7          |
|    2 |    2 | 17,18,19         |
+------+------+------------------+
4 rows in set (0.00 sec)

mysql> select key1,key2,group_concat(Id) from tb_dts group by key1,key2 limit 4,4;
+------+------+------------------+
| key1 | key2 | group_concat(Id) |
+------+------+------------------+
|    2 |    3 | 20,21,22         |
|    3 |    1 | 13,14,15,16      |
|    3 |    2 | 23,24,25,26      |
+------+------+------------------+
3 rows in set (0.00 sec)

But How do I put this in WHERE Fieldname IN clause ?

I need something like this, as my tb_dts contains more that 30 million reocrds, and 15 fields in real, I can't use ID BETWEEN min_id and max_id

For processing first 4 unique combination values I need

select * from tb_dts where Id IN (1,2,3,8,9,10,11,12,4,5,6,7,17,18,19  )

For processing next 4 unique combination values I need Ids in my application, so in short I want to have below mentioned Ids in my where Field IN clause

select * from tb_dts where Id IN (20,21,22,13,14,15,16,23,24,25,26 )

Here is structure of my table

DROP TABLE IF EXISTS `tb_dts`;
CREATE TABLE `tb_dts` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `key1` int(11) DEFAULT '-99',
  `key2` int(11) DEFAULT '-99',
  PRIMARY KEY (`Id`),
  KEY `main` (`key1`,`key2`)
) ENGINE=MyISAM AUTO_INCREMENT=27 DEFAULT CHARSET=latin1;

LOCK TABLES `tb_dts` WRITE;
INSERT INTO `tb_dts` VALUES (1,1,1),(2,1,1),(3,1,1),(4,2,1),(5,2,1),(6,2,1),(7,2,1),(8,1,2),(9,1,2),(10,1,2),(11,1,2),(12,1,2),(13,3,1),(14,3,1),(15,3,1),(16,3,1),(17,2,2),(18,2,2),(19,2,2),(20,2,3),(21,2,3),(22,2,3),(23,3,2),(24,3,2),(25,3,2),(26,3,2);
UNLOCK TABLES;

As you can see here, it gives first found Id for each distinct combination value

mysql> select Id from tb_dts group by key1,key2 limit 0,4;
+----+
| Id |
+----+
|  1 |
|  8 |
|  4 |
| 17 |
+----+
4 rows in set (0.00 sec)

But I expect all Ids which falls within the criteria given, that is nothing but what all IDs falls for below 4 unique value

mysql> select key1,key2 from tb_dts group by key1,key2 limit 0,4;
+------+------+
| key1 | key2 |
+------+------+
|    1 |    1 |  --- 1,2,3
|    1 |    2 |  --- 8,9,10,11,12
|    2 |    1 |  --- 4,5,6,7 
|    2 |    2 |  --- 17,18,19
+------+------+
4 rows in set (0.00 sec)

Expected o/p

I expect to get Id like this for group by key1,key2 limit 0,4, so that this can be used in my WHERE IN clause.

1
2
3
8
9
10
11
12
4
5
6
7
17
18
19

回答1:

For your immediate problem, you can use find_in_set like this:

select t.*
from your_table t
where exists (
    select 1 from (
        select group_concat(Id)
        from tb_dts
        group by key1, key2
        order by key1, key2   -- very important when using limit
        limit 0, 4
    ) t2 where find_in_set(t.fieldname, t2.ids) > 0
);

Though I am not sure if this is the best way to do what you're doing. Creating strings using group by and then searching in that string will be too slow.

Also, you want to have an index on key1, key2, id columns.

create index idx_tb_dts on tb_dts (key1, key2, id);

Can try this:

select t.*
from your_table t
where exists (
    select 1
    from tb_dts t1
    inner join (
        select distinct key1, key2
        from tb_dts
        order by key1, key2
        limit 0, 4
    ) t2 on t1.key1 = t2.key1
    and t1.key2 = t2.key2
    where t1.id = t.fieldname
);

You should understand that the group by or distinct part can be heavy on performance. It will be much better if there was a separate table containing unique key1, key2 with a unique index on them.

create table the_keys_table(
    key1 int not null,
    key2 int not null,
    primary key (key1, key2)
);

Then you could replace the tb_dts in below with that table like this:

select key1, key2       -- no distinct or group by needed.
from the_keys_table
order by key1, key2
limit 0, 4

Your final query becomes:

select t.*
from your_table t
where exists (
    select 1
    from tb_dts t2
    where (key1, key2) in (
        select key1, key2
        from the_keys_table
        order by key1, key2
        limit 0, 4
    ) and t1.id = t.fieldname
);


回答2:

you question seems a little confusing but from my understand if what you want is to use the ids in a WHERE field IN clause you can make use of subqueries. So in your case where you had select key1,key2,group_concat(Id) from tb_dts group by key1,key2 limit 4,4; that can simply become

select field 
from table_name 
where id IN 
  (select group_concat(Id) 
   from tb_dts group by key1,key2 limit 4,4)

You can find google subqueries to find out more. Let me know what you think about this approach. Hopefully this could be a step.



回答3:

you can try this

select  *
    from  tb_dts
    where  Id IN (
        SELECT  r2.Ids from
                  ( SELECT  group_concat(Id) Ids
                    from  tb_dts as r1
                    group by  r1.key1,r1.key2
                    limit  4,4
                  ) r2 
                 );


回答4:

Step 1: Abandon the pagination scheme you envision. Instead, paginate on key, not "4 rows per page".

Step 2: "Remember where you left off". If the first page has all the pairs for key1 = 1 or 2, then the second page starts with the next value after key1 = 2.

Now the query becomes much more efficient since it is over a "range" of key1 values. Currently, it must build the entire output before it can paginate!

More discussion of pagination without using OFFSET.