How realtime data input to Druid?

2019-04-13 17:02发布

问题:

I have analytic server (for example click counter). I want to send data to druid using some api. How should I do that? Can I use it as replacement for google analytics?

回答1:

As se7entyse7en said:

You can ingest your data to Kafka and then use druid's Kafka firehose to ingest your data to druid through real-time ingestion. After that you can interactively query druid using its api.

It must be said that firehoses can be setup only on Druid realtime nodes.

Here is a tutorial how to setup the Kafka firehose: Loading Streaming Data. Beside Kafka firehose, you can setup other provided firehoses - Amazon S3 firehose, RabbitMQ firehose, etc... by including them and you can even write your own firehose as an extension, an example is here. Here are all druid extensions.

It must be said that Druid is shifting real-time ingestion from realtime nodes to the Indexing service, as explained here.



回答2:

You can ingest your data to Kafka and then use druid's Kafka firehose to ingest your data to druid through real-time ingestion. After that you can interactively query druid using its api.



回答3:

Right now the best practise is to run Realtime Index Task on Indexing Service and then you can use Druid's API to send data to this task. You can use the API directly but it's far more easier to use Tranquility. It's a library that will automatically create new Realtime Index Task for new segments and it'll allow you to send messages to the right task. You can also set replication and sharding level etc. Just run the indexing service, use Tranquility and you can start sending your messages to Druid.



回答4:

The best way to use, considering your druid is a 0.9.x version is tranquility. The rest api is pretty solid and allows you to control your data schema. The druid.io quickstart page and hit the "Load streaming data" section.

I am loading in clickstream data for our website at real time and its been working very well. So, yes you can replace google analytics with druid (assuming, you have the required infrastructure).



标签: druid