How to continuously feed sniffed packets to kafka?

2019-02-25 09:10发布

问题:

Currently I am sniffing packets from my local wlan interface like :

sudo tshark > sampleData.pcap

However, I need to feed this data to kafka.

Currently, I have a kafka producer script producer.sh:

../bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 'spark-kafka'

and feed data to kafka like this:

producer.sh < sampleData.pcap

where in sampleData.pcap I have pre-captured IP packet information.

However, I wanna automate the process where it'd be something like this:

sudo tshark > http://localhost:9091
producer.sh < http://localhost:9091

This is obviously just a pseudoalgorithm. What I want to do is, send the sniffing data to a port and have kafka continuously read it. I don't want kafka to read from a file continuously because that'd mean tremendous amount of read/write operations from a single file causing inefficiency.

I searched the internet and came across kafka-connect but I can't find any useful documentation for implementing something like this.

What's the best way to implement something like this?

Thanks!

回答1:

With netcat

No need to write a server, you can use netcat (and tell your script to listen on the standard input):

shell1> nc -l 8888 | ./producer.sh
shell2> sudo tshark -l | nc 127.1 8888

The -l of tshark prevents it from buffering the output too much (flushes after each packet).


With a named pipe

You could also use a named pipe to transmit tshark output to your second process:

shell1> mkfifo /tmp/tsharkpipe
shell1> tail -f -c +0 /tmp/tsharkpipe | ./producer.sh
shell2> sudo tshark -l > /tmp/tsharkpipe


回答2:

I think you can either

  • create a tiny server that connects to kafka ant listens to a port
  • use the kafka-file connector and append all your data to that file. http://kafka.apache.org/documentation.html#quickstart_kafkaconnect


回答3:

If you use Node, you can use child_process and kafka_node to do it. Something like this:

var kafka = require('kafka-node');
var client = new kafka.Client('localhost:2181');
var producer = new kafka.Producer(client);

var spawn = require('child_process').spawn;
var tshark = spawn('sudo', ['/usr/sbin/tshark']);

tshark.stdout.on('data', (data) => {
  producer.send([
    {topic: 'spark-kafka', messages: [data.split("\n")]}
  ], (err,result) => { console.log("sent to kafka")});
});


回答4:

Another option would be to use Apache NiFi. With NiFi you can execute commands and pass the output to other blocks for further processing. Here you could have NiFi execute a tshark command on the local host and then pass the output to Kafka.

There is an example here which should demonstrate this type of approach in slightly more detail.