How can I code nullable objects in Google Cloud Da

2019-04-10 17:39发布

问题:

This post is intended to answer questions like the following:

  • Which built-in Coders support nullable values?
  • How can I encode nullable objects?
  • What about classes with nullable fields?
  • What about collections with null entries?

回答1:

You can inspect the built-in Coders in the DataflowJavaSDK source.

Some of the default Coders do not support null values, often for efficiency. For example, DoubleCoder always encodes a double using 8 bytes; adding a bit to reflect whether the double is null would add a (padded) 9th byte to all non-null values.

It is possible to encode nullable values using the techniques outlined below.

  1. We generally recommend using AvroCoder to encode classes. AvroCoder has support for nullable fields annotated with the org.apache.avro.reflect.Nullable annotation:

    @DefaultCoder(AvroCoder.class)
    class MyClass {
      @Nullable String nullableField;
    }
    

    See the TrafficMaxLaneFlow for a more complete code example.

    AvroCoder also supports fields that include Null in a Union.

  2. We recommend using NullableCoder to encode nullable objects themselves. This implements the strategy in #1.

    For example, consider the following working code:

    PCollection<String> output =
        p.apply(Create.of(null, "test1", null, "test2", null)
            .withCoder(NullableCoder.of(String.class)));
    
  3. Nested null fields/objects are supported by many coders, as long as the nested coder supports null fields/objects.

    For example, the SDK should be able to infer a working coder using the default CoderRegistry for a List<MyClass> -- it should automatically use a ListCoder with a nested AvroCoder.

    Similarly, a List<String> with possibly-null entries can be encoded with the Coder:

    Coder<List<String>> coder = ListCoder.of(NullableCoder.of(String.class))
    

Finally, in some cases Coders must be deterministic, e.g., the key used for GroupByKey. In AvroCoder, the @Nullable fields are coded deterministically as long as the Coder for the base type is itself deterministic. Similarly, using NullableCoder should not affect whether an object can be encoded deterministically.