-->

adding Apache flume elastic search sink

2019-08-02 16:38发布

问题:

I want to use Apache flume Elastic search native sink in cygnus. How can I configure cygnus to use this native sink?

Do I put the native jar file in Apache flume lib and configure the agent as the folowing:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

and then I start two cygnus instance one with for cygnus-ngsi agent and the other for elastic search native sink? Can't find how to run a cygnus instance (I installed cygnus from sources so I can't use linux services to start it)

And is it possible to use the same source for cygnus-ngsi and this new sink? cause I need the same source for both.

Thanks and best regards.

回答1:

Cygnus is based on Apache Flume, and it installs all Flume libraries. This means you could even use Cygnus as a pure Flume agent if you want. In other words, Cygnus is an extension of Apache Flume.

Thus, using a native sink in Cygnus is just a matter of configuration. In the particular case of ElasticSearchSink, simply follow the documentation:

cygnusngsi.sinks = elastic-sink <other_flume_sinks> <other_cygnus_sinks>
cygnusngsi.channels = elastic-channel <other_channels>
...
cygnusngsi.sinks.elastic-sink.type = elasticsearch
cygnusngsi.sinks.elastic-sink.hostNames = 127.0.0.1:9200,127.0.0.2:9300
cygnusngsi.sinks.elastic-sink.indexName = foo_index
cygnusngsi.sinks.elastic-sink.indexType = bar_type
cygnusngsi.sinks.elastic-sink.clusterName = foobar_cluster
cygnusngsi.sinks.elastic-sink.batchSize = 500
cygnusngsi.sinks.elastic-sink.ttl = 5d
cygnusngsi.sinks.elastic-sink.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
cygnusngsi.sinks.elastic-sink.channel = elastic-channel

Regarding the sources, you can configure more than one in a single agent configuration, and then connect each source to a different sink through a dedicated channel. Schematically:

source1 --- channel1 --- sink1
source2 --- channel2 --- sink2

Or you could use the same source for both sinks; in this case, you have to connect the two sinks to the single source through two dedicated channels as well, and the replicating channel selector (no need to configure it, this is the default behaviour) will create a copy of the incoming NGSI notification for each channel. Schematically:

          ___ channel1 --- sink1
source__ /
         \___ channel2 --- sink2