a count for each join - optimisation

2019-01-08 02:54发布

问题:

RESULTS : I've used three methods :

  1. Three sub queries, 1 join in each (mine)
  2. Three sub queries, no join, filtering with where (SlimsGhost)
  3. Triple join (Solarflare)

I've made some stats with "explain" and "profiling" which explains the work each query must do and the following results weren't surprising : stats

Relative results :

  1. 100%
  2. 79%
  3. 1715%

ORIGINAL POST

The idea is to join 4 tables, using the same PK each time and then count how many rows each join would separately gives.

The obvious answer is to do each join... separately with sub queries.

But is it possible to do it with one query? Would it be more efficient?

select "LES CIGARES DU PHARAON" as "Titre",
          (select count( payalb.idPays)
          from album alb
                     left join pays_album payalb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Pays",
          (select count( peralb.idPers)
          from album alb
                     left join pers_album peralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Personnages",
          (select count( juralb.idJur)
          from album alb
                     left join juron_album juralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Jurons"
; 
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON |    3 |          13 |     50 |
+------------------------+------+-------------+--------+

table album rows : 22

table pays_album rows : 45

table personnage_album rows : 100

table juron_album rows : 1704

Here is what I tried :

select alb.titreAlb as "Titre",
         sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",
         sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",
         sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"
from album alb
          left join pays_album payalb using ( idAlb )
          left join pers_album peralb using ( idAlb )
          left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
;
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 1950 |        1950 |   1950 |
+------------------------+------+-------------+--------+

but it counts the total number of rows of the full joined table, ... (1950 = 3 * 13 * 50)

schema : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_schema.png

tables content : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_description

If you want to play to play with it :

db_init : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_ok.mysql

回答1:

For optimization purposes, a good rule of thumb is to join less, not more. In fact, you should try to join as few rows as you can with as few rows as you can. With any additional join, you will multiply cost instead of adding cost. Because mysql will basically just generate a big multiplied matrix. A lot of that gets optimized away by indexes and other stuff though.

But to answer your question: it is actually possible to count with only one big join, assuming the tables have unique keys and idalb is a unique key for album. Then, and only then, you can do it similar to your code:

select alb.titreAlb as "Titre",
       count(distinct payalb.idAlb, payalb.PrimaryKeyFields) "Pays",
       count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",
       count(distinct juralb.idAlb, juralb.PrimaryKeyFields) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb

where PrimaryKeyFields stands for the primary key fields of the joined tables (you have to look them up).

Distinct will remove the effect the other joins have on the count. But unfortunately, in general, distinct will not remove the effect the joins have on the cost.

Although, if you have indexes that cover all (idAlb + PrimaryKeyFields)-fields of your tables, that might be even as fast as the original solution (because it can optimize the distinct to not do a sorting) and will come close to what you were thinking of (just walking through every table/index once). But in a normal or worst case szenario, it should perform worse than a reasonable solution (like SlimGhost's one) - because it is doubtful it will find the optimal strategy. But play around with it and check the explains (and post the findings), maybe mysql will do something crazy.



回答2:

As far as "least work for the db", I think the following would be both correct and less logical I/O for your schema. Understand though, that you CANNOT know for sure unless you look at the explain plans (expected and actual).

Still, I recommend trying this out - it accesses the "alb" table only once whereas your original query would need to access it four times (once to fetch the "base" album record, and then three more for the three subqueries).

select alb.titreAlb as "Titre",
     (select count(*) from pays_album t2 where t2.idAlb = alb.idAlb) "Pays",
     (select count(*) from pers_album t2 where t2.idAlb = alb.idAlb) "Personnages",
     (select count(*) from juron_album t2 where t2.idAlb = alb.idAlb) "Jurons"
from album alb
where alb.titreAlb = "LES CIGARES DU PHARAON"