I am struggling with the implementation in spark streaming.
The messages from the kafka looks like this but with with more fields
{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}
I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.
Can someone help me to implement this or any other alternatives?
Is it a good way to send the messages with multiple schemas in the same topic and consume it?
Thanks in advance,
You can create a
Dataframe
from the incoming JSON object.Create
Seq[Sring]
of JSON object.Use val
df=spark.read.json[Seq[String]]
.Perform the operations on the
dataframe df
of your choice.Converting JsonString to
JavaBean
if you only care about some columns