Skip to main content

Posts

Showing posts with the label avro data serialization

Apache Avro - Serialization and Deserialization

This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema . In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes. "Apache Avro™ is a data serialization system." We use DatumReader<T> and DatumWriter<T> for de-serialization and serialization of data, respectively. Apache Avro formats Apache Avro supports two formats, JSON and Binary . Let's move to an example using JSON format. Employee employee = Employee . newBuilder (). setFirstName ( "Gaurav" ). setLastName ( "Mazra" ). setSex ( SEX . MALE ). build (); DatumWriter < Employee > employeeWriter = new SpecificDatumWriter <>( Employee . class ); byte [] data ; try ( ByteArrayOutputStream baos = new ByteArrayOutputStream ()) { Encoder jsonEncoder = EncoderFactory . get (). jsonEncoder ( Employee . getClassSchema (), baos

Apache Avro - Introduction

In this post, we will discuss following items What is Apache Avro? What is Avro schema and how to define it? Serialization in Apache Avro. What is Apache Avro ? " Apache Avro is data serialization library " That's it, huh. This is what you will see when you open their official page. Apache Avro is: Schema based data serialization library. RPC framework (support). Rich data structures (Primary includes null, string, number, boolean and Complex includes Record, Array, Map etc.). A compact, fast and binary data format. What is Avro schema and how to define it? Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing. Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex . Primitive types These are the basic type supported by Avro. It includes null, int, long, byte