This post is intended to answer questions like the following:
- Which built-in
Coder
s support nullable values? - How can I encode nullable objects?
- What about classes with nullable fields?
- What about collections with
null
entries?
This post is intended to answer questions like the following:
Coder
s support nullable values?null
entries?You can inspect the built-in Coders in the DataflowJavaSDK source.
Some of the default Coders do not support null
values, often for efficiency. For example, DoubleCoder
always encodes a double
using 8 bytes; adding a bit to reflect whether the double
is null
would add a (padded) 9th byte to all non-null
values.
It is possible to encode nullable values using the techniques outlined below.
We generally recommend using AvroCoder
to encode classes. AvroCoder
has support for nullable fields annotated with the org.apache.avro.reflect.Nullable
annotation:
@DefaultCoder(AvroCoder.class)
class MyClass {
@Nullable String nullableField;
}
See the TrafficMaxLaneFlow for a more complete code example.
AvroCoder
also supports fields that include Null
in a Union
.
We recommend using NullableCoder
to encode nullable objects themselves. This implements the strategy in #1.
For example, consider the following working code:
PCollection<String> output =
p.apply(Create.of(null, "test1", null, "test2", null)
.withCoder(NullableCoder.of(String.class)));
Nested null
fields/objects are supported by many coders, as long as the nested coder supports null
fields/objects.
For example, the SDK should be able to infer a working coder using the default CoderRegistry
for a List<MyClass>
-- it should automatically use a ListCoder
with a nested AvroCoder
.
Similarly, a List<String>
with possibly-null
entries can be encoded with the Coder:
Coder<List<String>> coder = ListCoder.of(NullableCoder.of(String.class))
Finally, in some cases Coders must be deterministic, e.g., the key used for GroupByKey
. In AvroCoder
, the @Nullable
fields are coded deterministically as long as the Coder
for the base type is itself deterministic. Similarly, using NullableCoder
should not affect whether an object can be encoded deterministically.