mySQL query optimisation for browse tracker

2019-07-31 13:27发布

问题:

I have been reading lots of great answers to different problems over the time on this site but this is the first time I am posting. So in advance thanks for your help.

Here is my question:

I have a MySQL table that tracks visits to different websites we have. This is the table structure:

    create table navigation_base (
          uid int(11) NOT NULL,
          date datetime not null,
          dia date not null,
          ip int(4) unsigned not null default 0,
          session_id int unsigned not null,
          cliente smallint unsigned not null default 0,
          campaign mediumint unsigned not null default 0,
          trackcookie int unsigned not null,
          adgroup int unsigned not null default 0,
          PRIMARY KEY (uid)
     ) ENGINE=MyISAM;

This table has aprox. 70 million rows (an average of 110,000 per day).

On that table we have created indexes with following commands:

alter table navigation_base add index dia_cliente_campaign_ip (dia,cliente,campaign,ip);
alter table navigation_base add index dia_cliente_campaign_ip_session (dia,cliente,campaign,ip,session_id);
alter table navigation_base add index dia_cliente_campaign_ip_session_trackcookie (dia,cliente,campaign,ip,session_id,trackcookie);

We then use this table to get visitor statistics grouped by clients, days and campaigns with the following query:

select 
  dia,
  navigation_base.campaign,
  navigation_base.cliente,
  count(distinct ip) as visitas,
  count(ip) as paginas_vistas,
  count(distinct session_id) as sesiones,
  count(distinct trackcookie) as cookies 
from navigation_base where 
  (dia between '2017-01-01' and '2017-01-31') 
  group by dia,cliente,campaign order by NULL

Even having those indexes created, the response times for periods of one month are relatively slow; On our server about 3 seconds.

Are there some ways of speeding up these queries?

Thanks in advance.

回答1:

With this much of data, indexing alone may not be all that helpful since there is a lot of similarity in the data. Besides you have GROUP BY and SORT along with aggregation. All these things combined makes optimization very hard. partitioning is the way forward, because:

Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions, which automatically excludes any remaining partitions from the search. Because partitions can be altered after a partitioned table has been created, you can reorganize your data to enhance frequent queries that may not have been often used when the partitioning scheme was first set up.

And if this doesn't work for you, it's still possible to

In addition, MySQL 5.7 supports explicit partition selection for queries. For example, SELECT * FROM t PARTITION (p0,p1) WHERE c < 5 selects only those rows in partitions p0 and p1 that match the WHERE condition.

ALTER TABLE navigation_base
        PARTITION BY RANGE( TO_DAYS(dia)) (
        PARTITION p0 VALUES LESS THAN (TO_DAYS('2018-12-31')),
        PARTITION p1 VALUES LESS THAN (TO_DAYS('2017-12-31')),
        PARTITION p2 VALUES LESS THAN (TO_DAYS('2016-12-31')),
        PARTITION p3 VALUES LESS THAN (TO_DAYS('2015-12-31')),
        ..
        PARTITION p10 VALUES LESS THAN MAXVALUE));

Use bigger or smaller partitions as you see fit.

The most important factor to keep in mind is that mysql can only use one index per table. So choose your index wisely.



回答2:

If you only do COUNT(DISTINCT ...) at the granularity of a day, then build and incrementally maintain a summary table. It would augmented each night by a query nearly identical to your SELECT, but only fetching yesterday's data.

Then use this Summary Table for the monthly "report".

More on Summary Tables