Apache Beam stream processing of json data

2019-07-31 16:59发布

问题:

I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.

I want to to stream simple json data using Apache Beam programmatically (Java).

{"UserID":"1","Address":"XXX","ClassNo":"989","UserName":"Stella","ClassType":"YYY"}

Can someone please guide me or direct me with an example link?

回答1:

There are multiple aspects of this:

  • first you need to establish where the data is coming from:
    • you need to use some kind of IO in Beam pipeline, see here;
    • there are a bunch of built in IOs, see the list here;
    • by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
    • some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
  • then you may need to transform the data:

    • you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
      • see the section about PTransforms here;
    • you can see an example of such transform here:
      • this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
      • you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
  • you may also take a look at examples folder in Beam source;