I have written code to serialize objects to JSON and BSON. According to my output, the BSON produced is greater in size than the JSON. Is this expected?
From my code for Bson.class
(using Jackson and bson4jackson)
private ByteArrayOutputStream baos = new ByteArrayOutputStream();
private BsonFactory fac = new BsonFactory();
private ObjectMapper mapper = new ObjectMapper(fac);
public Bson(Object obj) throws JsonGenerationException,
JsonMappingException, IOException {
mapper.writeValue(baos, obj);
}
public int size() {
return baos.size();
}
public String toString() {
byte[] bytes = baos.toByteArray();
return new String(bytes);
}
From my Json.class
private ByteArrayOutputStream baos = new ByteArrayOutputStream();
private ObjectMapper mapper = new ObjectMapper();
public Json(Object obj) throws JsonGenerationException,
JsonMappingException, IOException {
mapper.writeValue(baos, obj);
}
(size()
and toString()
as above)
My POJOs are Person.class
and Address.class
.
In my main class:
Address a = new Address("Jln Koli", "90121", "Vila", "Belgium");
Person p = new Person("Ali Bin Baba", new Date(), 90.0, 12, a);
List<Person> persons = new LinkedList<>();
persons.add(p);
persons.add(p);
Bson bson = new Bson(persons);
Json json = new Json(persons);
System.out.println("Bson : " + bson.size() + ", data : " + bson.toString());
System.out.println("Json : " + json.size() + ", data : " + json.toString());
The ouput:
Bson : 301, data : -
Json : 285, data : [{"name":"Ali Bin Baba","birthd...
My Question:
- Is that output true, or is my code wrong?
- Any suggestion to check/test, to compare the sizes of BSON and JSON?
From the BSON FAQ:
BSON is designed to be efficient in space, but in many cases is not
much more efficient than JSON. In some cases BSON uses even more space
than JSON. The reason for this is another of the BSON design goals:
traversability. BSON adds some "extra" information to documents, like
length prefixes, that make it easy and fast to traverse.
BSON is also designed to be fast to encode and decode. For example,
integers are stored as 32 (or 64) bit integers, so they don't need to
be parsed to and from text. This uses more space than JSON for small
integers, but is much faster to parse.
For a string field, the overhead in JSON is 6 bytes -- 4 quotes, a colon and a comma. In BSON it's 7 -- entry type byte, null terminator to field name, 4 byte string length, null terminator to value.
For an integer field, the JSON length depends on the size of the number. "1" is just one byte. "1000000" is 7 bytes. In BSON both of these would be a 4 byte 32 bit integer. The situation with floating point numbers is similar.
BSON is not intended to be smaller. It is intended to be closer to the structures that computers work with natively, so that it can be worked with more efficiently -- that is one meaning of "light".
If you're not chasing extreme levels of performance (as the MongoDB developers who designed BSON are), then I would advise using JSON -- the human-readability is a great benefit to the developer. As long as you use a library like Jackson, migrating to BSON later should not be hard -- as you can see by how almost identical your own BSON and JSON classes are.
Bear in mind that if size is an issue, both JSON and BSON should compress well.
The property "foo":"bar"
consumes 11 bytes in UTF-8 encoded JSON. In BSON it consumes 13:
bytes description
============================================
1 entry type value \x02
3 "foo"
1 NUL \x00
4 int32 string length (4 -- includes the NUL)
3 "bar"
1 NUL \x00
There are many cases in which JSON will be more compact.