I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name.
Have not found any information about dynamical table extend in official documentation.
So can this case be implemented in Clickhouse?
UPD:
After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more slower.
It would probably be better to modify your schema to have 4 columns:
"some_entity_id", "timestamp", "metric_name", "metric_value"
You can include "metric_name" in the MergeTree index, to improve performance when searching for a specific metric of an entity. Test with and without it, to see if it's useful for the kind of queries you make.
There are more than one ways to use CH as a time series database.
My personal preference is to use one string array for metric names and one Float64 array for metric values.
This is a sample time series table:
CREATE TABLE ts1(
entity String,
ts UInt64, -- timestamp, milliseconds from January 1 1970
m Array(String), -- names of the metrics
v Array(Float32), -- values of the metrics
d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)
Here we are loading two metrics (load, temperature) for an entity(cpu):
INSERT INTO ts1(entity, ts, m, v)
VALUES ('cpu', 1509232010254, ['load','temp'], [0.85, 68])
And querying cpu load:
SELECT
entity,
dt,
ts,
v[indexOf(m, 'load')] AS load
FROM ts1
WHERE entity = 'cpu'
┌─entity─┬──────────────────dt─┬────────────ts─┬─load─┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ 0.85 │
└────────┴─────────────────────┴───────────────┴──────┘
Get data as array of tuples:
SELECT
entity,
dt,
ts,
arrayMap((mm, vv) -> (mm, vv), m, v) AS metrics
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics─────────────────────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ [('load',0.85),('temp',68)] │
└────────┴─────────────────────┴───────────────┴─────────────────────────────┘
Get data as lines of tuples:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │
└────────┴─────────────────────┴───────────────┴───────────────┘
Get lines with the metric you want:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metrics
FROM ts1
WHERE metrics.1 = 'load'
┌─entity─┬──────────────────dt─┬────────────ts─┬─metrics───────┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │
└────────┴─────────────────────┴───────────────┴───────────────┘
Get metric names and values as columns:
SELECT
entity,
dt,
ts,
arrayJoin(arrayMap((mm, vv) -> (mm, vv), m, v)) AS metric,
metric.1 AS metric_name,
metric.2 AS metric_value
FROM ts1
┌─entity─┬──────────────────dt─┬────────────ts─┬─metric────────┬─metric_name─┬─metric_value─┐
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('load',0.85) │ load │ 0.85 │
│ cpu │ 2017-10-28 23:06:50 │ 1509232010254 │ ('temp',68) │ temp │ 68 │
└────────┴─────────────────────┴───────────────┴───────────────┴─────────────┴──────────────┘
Since CH has lots of useful date and time functions, along with higher order functions and tuples, I think it's almost a natural time-series database.
Edit:
WARNING
After using this method myself with several table I have observed querying columns with the Array(Tuple(String,String,String)) Definition seems to crash the database on large tables (1+ billion rows), so take this with a grain of salt, what I have described here may well be UB, but I've gotten not official word on it from the devs yet
Original answer:
You can Alter tables, but not dynamically.
Also once a column is added you always need to insert new stuff into it, though you can always have a "default" value.
That being said... I found myself needed to dynamically insert values and there is a "Hack" for doing so, namely using this column:
Array(Tuple(String,String))
This basically means you can have an array of as many values as you want and insert into it touples of "description" "value".
So for one row your array could be:
[("metric_1":"val1"), ("metric_2":"val2")]
for the other:
[("metric_1":"val3"), ("metric_3":"val4"), ("metric_4":"val5")]
The idea here is that you can convert the values from string to any other type, so in essence you could store any type you want in there.
What if you need to know the type for every operatin and the type can differ ?... well:
array(Tuple(String,String,String))
and in the touples store "name","type","value"
This is the closest thing I could think of that might fit what you want. Of course, you should take a look at the array manipulation function to see if they offer you what you want (they are quite versatile, you can more or less do with an array everything you can with a row of the table itself).
What's the downside ?
Well, speed.
This will make querying very slow. Depending on what you want to do this may or may not be a problem for you. If you filter out your data well enough and hardly ever need to do queries on more than a few dozen or, at most, hundreds of millions of rows (and have good enough machines to handle the queries) than these dynamic array extensions could work.
did you see https://clickhouse.yandex/reference_en.html#ALTER ?
it's used only for *MergeTree clickhouse table engine