Apache Avro - Serialization and Deserialization

This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema.

In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes.

"Apache Avro™ is a data serialization system." We use DatumReader<T> and DatumWriter<T> for de-serialization and serialization of data, respectively.

Apache Avro formats

Apache Avro supports two formats, JSON and Binary.

Let's move to an example using JSON format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(Employee.getClassSchema(), baos);
  employeeWriter.write(employee, jsonEncoder);
  jsonEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(new String(data));
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder decoder = DecoderFactory.get().jsonDecoder(Employee.getClassSchema(), new String(data));
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

Explanation on the way :)

Line 1: We create an object of class Employee (AVRO generated)

Line 3: We create an object of SpecificDatumWriter<T> which implements DatumWriter<T> Also, there exists other implementation of DatumWriter viz. GenericDatumWriter and ReflectDatumWriter.

Line 6: We create JsonEncoder by passing Schema and OutputStream where we want the serialized data and In our case, it is in-memory ByteArrayOutputStream.

Line 7: We call #write method on DatumWriter with Object and Encoder.

Line 8: We flushed the JsonEncoder. Internally, it flushes the OutputStream passed to JsonEncoder.

Line 15: We created object of SpecificDatumReader<T> which implements DatumReader<T>. Also, there exists other implementation of DatumReader viz. GenericDatumReader and ReflectDatumReader.

Line 16: We create JsonDecoder passing Schema and input String which will be deserialized.

Let's move to serialization and de-serialization example with Binary format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(baos, null);
  employeeWriter.write(employee, binaryEncoder);
  binaryEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(data);
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder binaryDecoder = DecoderFactory.get().binaryDecoder(data, null);
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

All the example is same except Line 6 and Line 16 where we are creating an object of BinaryEncoder and BinaryDecoder.

This is how to we can serialize and deserialize data with Apache Avro. I hope you found this article informative and useful. You can find the full example on github.

2 comments :