Showing posts with label AVRO example. Show all posts

This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema.

In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes.

"Apache Avro™ is a data serialization system." We use DatumReader<T> and DatumWriter<T> for de-serialization and serialization of data, respectively.

Apache Avro formats

Apache Avro supports two formats, JSON and Binary.

Let's move to an example using JSON format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(Employee.getClassSchema(), baos);
  employeeWriter.write(employee, jsonEncoder);
  jsonEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(new String(data));
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder decoder = DecoderFactory.get().jsonDecoder(Employee.getClassSchema(), new String(data));
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

Explanation on the way :)

Line 1: We create an object of class Employee (AVRO generated)

Line 3: We create an object of SpecificDatumWriter<T> which implements DatumWriter<T> Also, there exists other implementation of DatumWriter viz. GenericDatumWriter and ReflectDatumWriter.

Line 6: We create JsonEncoder by passing Schema and OutputStream where we want the serialized data and In our case, it is in-memory ByteArrayOutputStream.

Line 7: We call #write method on DatumWriter with Object and Encoder.

Line 8: We flushed the JsonEncoder. Internally, it flushes the OutputStream passed to JsonEncoder.

Line 15: We created object of SpecificDatumReader<T> which implements DatumReader<T>. Also, there exists other implementation of DatumReader viz. GenericDatumReader and ReflectDatumReader.

Line 16: We create JsonDecoder passing Schema and input String which will be deserialized.

Let's move to serialization and de-serialization example with Binary format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(baos, null);
  employeeWriter.write(employee, binaryEncoder);
  binaryEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(data);
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder binaryDecoder = DecoderFactory.get().binaryDecoder(data, null);
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

All the example is same except Line 6 and Line 16 where we are creating an object of BinaryEncoder and BinaryDecoder.

This is how to we can serialize and deserialize data with Apache Avro. I hope you found this article informative and useful. You can find the full example on github.

This post is in continuation to my previous post on Apache Avro - Introduction. In this post, we will discuss about generating classes from Schema.

How to create Apache Avro schema?

There are two ways to generate AVRO classes from Schema.

  • Pragmatically generating schema
  • Using maven Avro plugin

Consider we have following schema in "src/main/avro"

{
  "type" : "record",
  "name" : "Employee",
  "namespace" : "com.gauravbytes.avro",
  "doc" : "Schema to hold employee object",
  "fields" : [{
    "name" : "firstName",
    "type" : "string"
  },
  {
    "name" : "lastName",
    "type" : "string"
  }, 
  {
    "name" : "sex", 
    "type" : {
      "name" : "SEX",
      "type" : "enum",
      "symbols" : ["MALE", "FEMALE"]
    }
  }]
}

Pragmatically generating classes

Classes can be generated for schema using SchemaCompiler.

public class PragmaticSchemaGeneration {
 private static final Logger LOGGER = LoggerFactory.getLogger(PragmaticSchemaGeneration.class);

 public static void main(String[] args) {
  try {
   SpecificCompiler compiler = new SpecificCompiler(new Schema.Parser().parse(new File("src/main/avro/employee.avsc")));
   compiler.compileToDestination(new File("src/main/avro"), new File("src/main/java"));
  } catch (IOException e) {
   LOGGER.error("Exception occurred parsing schema: ", e);
  }
 }
}

At line number 6, we create the object of SpecificComplier. It has two constructor, one take Protocolas an argument and other take Schema as an argument.

Using Maven plugin to generate schema

There is maven plugin which can generate schema for you. You need to add following configuration to your pom.xml.

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>${avro.version}</version>
  <executions>
    <execution>
      <id>schemas</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
        <goal>protocol</goal>
        <goal>idl-protocol</goal>
      </goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>

This is how we can generate classes from Avro schema. I hope you find this post informative and helpful. You can find the full project on Github.