Trying to read from kafka source. I want to extract timestamp from message received to do structured spark streaming. kafka(version 0.10.0.0) spark streaming(version 2.0.1)
相关问题
- Delete Messages from a Topic in Apache Kafka
- In Spark Streaming how to process old data and del
- Serializing a serialized Thrift struct to Kafka in
- Kafka broker shutdown while cleaning up log files
- Getting : Error importing Spark Modules : No modul
相关文章
- Kafka doesn't delete old messages in topics
- How to create Spark RDD from an iterator?
- How to access lookup(broadcast) RDD(or dataset) in
- Array of JSON to Dataframe in Spark received by Ka
- Kafka + Spark Streaming: constant delay of 1 secon
- Spring Kafka Template implementaion example for se
- How to fetch recent messages from Kafka topic
- Determine the Kafka-Client compatibility with kafk
Field "timestamp" is what you are looking for. Type - java.sql.Timestamp. Make sure that you are connecting to 0.10 Kafka server. There is no timestamp in earlier versions. Full list of fields described here - http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
I'd suggest couple things:
Suppose you create a stream via latest Kafka Streaming Api (0.10 Kafka)
E.g. you use dependency:
"org.apache.spark" %% "spark-streaming-kafka-0-10" % 2.0.1
Than you create a stream, according to the docs above:
Your stream will be an DStream of ConsumerRecord[String,Array[Byte]] and you can get a timestamp and key-value as simple as:
Hope that helps.