A DoFn
in our Dataflow pipeline contains a type with a Random
field pointing to a SecureRandom
instance, and that field fails to deserialize when running in the Dataflow service using DataflowPipelineRunner
. (stack trace below)
We create the SecureRandom
using its default ctor, which happens to hand back an instance that uses sun.security.provider.Sun
as its java.security.Provider
(see SecureRandom#getProvider
). SecureRandom
extends Random
, which is serializable.
The Dataflow service chokes when trying to deserialize this class because it can't create sun.security.provider.Sun
.
Looking closer at the stack trace, I see that deserialization happens through com.google.apphosting.runtime.security.UserClassLoader
, and now my theory is that this classloader doesn't allow loading of sun.*
classes, or at least this particular sun.*
class.
java.lang.IllegalArgumentException: unable to deserialize com.example.Example@13e88d
at com.google.cloud.dataflow.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:73)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.clone(SerializableUtils.java:88)
at com.google.cloud.dataflow.sdk.transforms.ParDo$Bound.<init>(ParDo.java:683)
[...]
Caused by: java.lang.ClassNotFoundException: sun.security.provider.Sun
at com.google.apphosting.runtime.security.UserClassLoader.loadClass(UserClassLoader.java:442)
at java.lang.ClassLoader.loadClass(ClassLoader.java:375)
at java.lang.Class.forName0(Native Method)
[...]
The problem is that
sun.security.provider.Sun
doesn't appear on the App Engine JRE whitelist, so the classloader can't instantiate instances of it:https://cloud.google.com/appengine/docs/java/jrewhitelist
But luckily you can still say
new SecureRandom()
in the same environment.To work around the problem, we added a custom de/serialization hook to the class with the
Random
field. Simplified example: