Spark Streamming : Reading data from kafka that ha

2020-06-23 09:36发布

I am struggling with the implementation in spark streaming.

The messages from the kafka looks like this but with with more fields

{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}

I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.

Can someone help me to implement this or any other alternatives?

Is it a good way to send the messages with multiple schemas in the same topic and consume it?

Thanks in advance,

2条回答
唯我独甜
2楼-- · 2020-06-23 10:03

You can create a Dataframe from the incoming JSON object.

Create Seq[Sring] of JSON object.

Use val df=spark.read.json[Seq[String]].

Perform the operations on the dataframe df of your choice.

查看更多
放荡不羁爱自由
3楼-- · 2020-06-23 10:11

Converting JsonString to JavaBean if you only care about some columns

查看更多
登录 后发表回答