ClickHouse Kafka Performance

2019-05-10 16:09发布

问题:

Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/

I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.

Here the structure of my tables:

CREATE TABLE games (
    UserId UInt32,
    ActivityType UInt8,
    Amount Float32,
    CurrencyId UInt8,
    Date String
  ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');


CREATE TABLE tests.games_transactions (
    day Date,
    UserId UInt32,
    Amount Float32,
    CurrencyId UInt8,
    timevalue DateTime,
    ActivityType UInt8
 ) ENGINE = MergeTree(day, (day, UserId), 8192);


  CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
    AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
    FROM default.games;

In the Kafka topic I am getting around 150 messages per second.

Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.

Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka

Should I set some particular configuration?

I tried to change the configurations from the cli:

SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750

But there was no improvement

Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?

回答1:

There is an issue for this on ClickHouse github - https://github.com/yandex/ClickHouse/issues/2169.

Basically you need to set max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) before table is created, otherwise it will not work.

I used the solution with overriding users.xml:

<yandex>
    <profiles>
        <default>
           <max_block_size>100</max_block_size>
        </default>
    </profiles>
</yandex>

I deleted my table and db and recreated them. It has worked for me. Now may tables get updated every 100 records.