Skip to main content

Posts

Showing posts with the label Apache AVRO

Apache Avro - RPC framework

This post is in continuation to my previous posts on Apache Avro - Introduction , Apache Avro - Generating classes from Schema and Apache Avro - Serialization . In this post, we will share insights on using Apache Avro as RPC framework. We first need to define a protocol to use Apache Avro as RPC framework. Before going into depth of this topic, let's discuss What protocol is? Avro protocols describes RPC interfaces. They are defined as JSON similar to Schema . A protocol has following attributes protocol : a string, defining name of the protocol. namespace : an optional that qualifies the name. types : an optional list of definitions of named types (like record, enum, fixed and errors). messages : an optional JSON object whose keys are method names of protocoland whose values are objects whose attributes are described below. No two messages may have the same name. Further, Message have following attributes request : a list of named, typed parameter schemas. respon

Comparing Java Default Serialization with Apache Avro serialization

I did a comparison of Java default serialization and Apache Avro serialization of data and results were very astonishing. You can read my older posts for Java serialization process and Apache Avro Serialization . Apache Avro consumed 15-20 times less memory to store the serialized data. I created a class with three fields (two String and one enum and serialized them with Avro and Java. The memory used by Avro is 14 bytes and Java used 231 bytes (length of byte[] ) Reason for generating less bytes by Avro Java Serialization The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written. Apac

Apache Avro - Serialization and Deserialization

This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema . In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes. "Apache Avro™ is a data serialization system." We use DatumReader<T> and DatumWriter<T> for de-serialization and serialization of data, respectively. Apache Avro formats Apache Avro supports two formats, JSON and Binary . Let's move to an example using JSON format. Employee employee = Employee . newBuilder (). setFirstName ( "Gaurav" ). setLastName ( "Mazra" ). setSex ( SEX . MALE ). build (); DatumWriter < Employee > employeeWriter = new SpecificDatumWriter <>( Employee . class ); byte [] data ; try ( ByteArrayOutputStream baos = new ByteArrayOutputStream ()) { Encoder jsonEncoder = EncoderFactory . get (). jsonEncoder ( Employee . getClassSchema (), baos

Apache Avro - Generating classes from Schema

This post is in continuation to my previous post on Apache Avro - Introduction . In this post, we will discuss about generating classes from Schema. How to create Apache Avro schema? There are two ways to generate AVRO classes from Schema. Pragmatically generating schema Using maven Avro plugin Consider we have following schema in " src/main/avro " { "type" : "record" , "name" : "Employee" , "namespace" : "com.gauravbytes.avro" , "doc" : "Schema to hold employee object" , "fields" : [{ "name" : "firstName" , "type" : "string" }, { "name" : "lastName" , "type" : "string" }, { "name" : "sex" , "type" : { "name" : "SEX" , "type" : "enum" , "symbols"

Apache Avro - Introduction

In this post, we will discuss following items What is Apache Avro? What is Avro schema and how to define it? Serialization in Apache Avro. What is Apache Avro ? " Apache Avro is data serialization library " That's it, huh. This is what you will see when you open their official page. Apache Avro is: Schema based data serialization library. RPC framework (support). Rich data structures (Primary includes null, string, number, boolean and Complex includes Record, Array, Map etc.). A compact, fast and binary data format. What is Avro schema and how to define it? Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing. Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex . Primitive types These are the basic type supported by Avro. It includes null, int, long, byte