Comparing Java Default Serialization with Apache Avro serialization

I did a comparison of Java default serialization and Apache Avro serialization of data and results were very astonishing.

You can read my older posts for Java serialization process and Apache Avro Serialization.

Apache Avro consumed 15-20 times less memory to store the serialized data. I created a class with three fields (two String and one enum and serialized them with Avro and Java.

The memory used by Avro is 14 bytes and Java used 231 bytes (length of byte[])

Reason for generating less bytes by Avro

Java Serialization

The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.

Apache Avro

writes only the schema as String and data of class being serialized. There is no per field overhead of writing the class of the object, the class signature as in Java. Also, the fields are serialized in pre-determined order.

You can find the full Java example on github.

Avro can't handle circular references and throw java.lang.StackOverflowError whereas Java's default serialization can handle it. (example code for Avro and example code for Java serialization) Another observation is that Avro have no direct way of defining inheritance in the Schema (Classes) but Java's default serialization support inheritance with its own constraints like super class either need to implements Serializable interface or have default no-args constructor accessible till top hierarchy (otherwise will throw java.io.NotSerializableException).

You can also view my other posts on Avro.

No comments :

Post a Comment