Is it safe to deserialize untrusted data, provided my code makes no assumptions about the state or class of the deserialized object, or can the mere act of deserializing cause undesired operation?
(Threat model: The attacker may freely modify the serialized data, but that's all he can do)
Deserialization itself can already be unsafe. A serializable class may define a readObject
method (see also the specification), which is called when an object of this class is going to be deserialized from the stream. The attacker cannot provide this code, but using a crafted input she can invoke any such readObject
method that is on your classpath, with any input.
Code injection
It is possible to make a readObject
implementation that opens the door to arbitrary bytecode injection. Simply read a byte array from the stream and pass it to ClassLoader.defineClass
and ClassLoader.resolveClass()
(see the javadoc for the former and the later). I don't know what the use of such an implementation would be, but it is possible.
Memory exhaustion
Writing secure readObject
methods is hard. Up until somewhat recently the readObject
method of HashMap
contained the following lines.
int numBuckets = s.readInt();
table = new Entry[numBuckets];
This makes it very easy for an attacker to allocate several gigabytes of memory with just a few dozen bytes of serialized data, which will have your system down with an OutOfMemoryError
in no time.
The current implementation of Hashtable
seems to still be vulnerable to a similar attack; it computes the size of the allocated array based on the number of elements and the load factor, but there is no guard in place against unreasonable values in loadFactor
, so we can easily request a billion slots be allocated for each element in the table.
Excessive CPU load
Fixing the vulnerability in HashMap
was done as part of changes to address another security issue related to hash-based maps. CVE-2012-2739 describes a denial-of-servic attack based on CPU consumption by creating a HashMap
with very many colliding keys (i.e. distinct keys with the same hash value). The documented attacks are based on query parameters in URLs or keys in HTTP POST data, but deserialization of a HashMap
is also vulnerable to this attack.
The safeguards that were put into HashMap
to prevent this type of attack are focussed on maps with String
keys. This is adequate to prevent the HTTP-based attacks, but is easily circumvented with deserialization, e.g. by wrapping each String
with an ArrayList
(whose hashCode is also predictable). Java 8 includes a proposal (JEP-180) to further improve the behaviour of HashMap
in the face of many collisions, which extends the protection to all key types that implements Comparable
, but that still allows an attack based on ArrayList
keys.
The upshot of this is that is possible for the attacker to engineer a byte streams such that the CPU effort it takes to deserialize an object from this stream grows quadratically with the size of the stream.
Summary
By controlling the input to the deserialization process an attacker can trigger the invocation of any readObject
deserialization-method. It is theoretically possible for such a method to allow bytecode injection. In practice it is certainly possible to easily exhaust memory or CPU resources this way, resulting in denial-of-service attacks. Auditing your system against such vulnerabilities is very difficult: you have to check every implementation of readObject
, including those in third-party libraries and the runtime library.