How to continuously feed sniffed packets to kafka?

2019-02-25 09:10发布


Currently I am sniffing packets from my local wlan interface like :

sudo tshark > sampleData.pcap

However, I need to feed this data to kafka.

Currently, I have a kafka producer script

../bin/ --broker-list localhost:9092 --topic 'spark-kafka'

and feed data to kafka like this: < sampleData.pcap

where in sampleData.pcap I have pre-captured IP packet information.

However, I wanna automate the process where it'd be something like this:

sudo tshark > http://localhost:9091 < http://localhost:9091

This is obviously just a pseudoalgorithm. What I want to do is, send the sniffing data to a port and have kafka continuously read it. I don't want kafka to read from a file continuously because that'd mean tremendous amount of read/write operations from a single file causing inefficiency.

I searched the internet and came across kafka-connect but I can't find any useful documentation for implementing something like this.

What's the best way to implement something like this?



With netcat

No need to write a server, you can use netcat (and tell your script to listen on the standard input):

shell1> nc -l 8888 | ./
shell2> sudo tshark -l | nc 127.1 8888

The -l of tshark prevents it from buffering the output too much (flushes after each packet).

With a named pipe

You could also use a named pipe to transmit tshark output to your second process:

shell1> mkfifo /tmp/tsharkpipe
shell1> tail -f -c +0 /tmp/tsharkpipe | ./
shell2> sudo tshark -l > /tmp/tsharkpipe


I think you can either

  • create a tiny server that connects to kafka ant listens to a port
  • use the kafka-file connector and append all your data to that file.


If you use Node, you can use child_process and kafka_node to do it. Something like this:

var kafka = require('kafka-node');
var client = new kafka.Client('localhost:2181');
var producer = new kafka.Producer(client);

var spawn = require('child_process').spawn;
var tshark = spawn('sudo', ['/usr/sbin/tshark']);

tshark.stdout.on('data', (data) => {
    {topic: 'spark-kafka', messages: [data.split("\n")]}
  ], (err,result) => { console.log("sent to kafka")});


Another option would be to use Apache NiFi. With NiFi you can execute commands and pass the output to other blocks for further processing. Here you could have NiFi execute a tshark command on the local host and then pass the output to Kafka.

There is an example here which should demonstrate this type of approach in slightly more detail.