Java 8 introduced default and static methods in interfaces. These features allow us to add new functionality in the interfaces without breaking the existing contract for implementing classes.

How do we define default and static methods?

Default method has default and static method has static keyword in the method signature.

public interface InterfaceA {
  double someMethodA();

  default double someDefaultMethodB() {
    // some default implementation
  }
  
  static void someStaticMethodC() {
    //helper method implementation 
  }

Few important points for default method

  • You can inherit the default method.
  • You can redeclare the default method essentially making it abstract.
  • You can redefine the default method (equivalent to overriding).

Why do we need default and static methods?

Consider an existing Expression interface with existing implementation like ConstantExpression, BinaryExpression, DivisionExpression and so on. Now, you want to add new functionality of returning the signum of the evaluated result and/or want to get the signum after evaluating the expression. This can be done with default and static methods without breaking any functionality as follows.

public interface Expression {
  double evaluate();

  default double signum() {
    return signum(evaluate());
  }

  static double signum(double value) {
    return Math.signum(value);
  }
}

You can find the full code on Github.

Default methods and multiple inheritance ambiguity problem

Java support multiple inheritance of interfaces. Consider you have two interfaces InterfaceA and InterfaceB with same default method and your class implements both the interfaces.

interface InterfaceA {
  void performA();
  default void doSomeWork() {
  
  }
}

interface InterfaceB {
  void performB();

  default void doSomeWork() {
 
  }
}

class ConcreteC implements InterfaceA, InterfaceB {

}

The above code will fail to compile with error: unrelated defaults for doSomeWork() from InterfaceA and InterfaceB.

To overcome this problem, you need to override the default method.

class ConcreteC implements InterfaceA, InterfaceB {
  override
  public void doSomeWork() {

  }
}
If you don't want to provide implementation of overridden default method but want to reuse one. That is also possible with following syntax.
class ConcreteC implements InterfaceA, InterfaceB {
  override
  public void doSomeWork() {
    InterfaceB.super.doSomeWork();
  }
}

I hope you find this post informative and useful. Comments are welcome!!!.

This post is in continuation to my previous posts on Apache Avro - Introduction, Apache Avro - Generating classes from Schema and Apache Avro - Serialization.

In this post, we will share insights on using Apache Avro as RPC framework.

We first need to define a protocol to use Apache Avro as RPC framework. Before going into depth of this topic, let's discuss What protocol is?

Avro protocols describes RPC interfaces. They are defined as JSON similar to Schema.

A protocol has following attributes

  • protocol: a string, defining name of the protocol.
  • namespace: an optional that qualifies the name.
  • types: an optional list of definitions of named types (like record, enum, fixed and errors).
  • messages: an optional JSON object whose keys are method names of protocoland whose values are objects whose attributes are described below. No two messages may have the same name.

Further, Message have following attributes

  • request: a list of named, typed parameter schemas.
  • response: a response schema.
  • error: an optional union of declared error schemas.

Let's define a simple protocol to exchange email message between client and server.

{
  "namespace": "com.gauravbytes.avro",
  "protocol": "EmailSender",
   "types": [{
     "name": "EmailMessage", "type": "record",
     "fields": [{
       "name": "to",
       "type": "string"
     },
     {
       "name": "from",
       "type": "string"
     },
     {
       "name": "body",
       "type": "string"
     }]
   }],
   "messages": {
     "send": {
       "request": [{"name": "email", "type": "EmailMessage"}],
       "response": "string"
     }
   }
}

Here, The protocol defines an interface EmailSender which takes an EmailMessage as request and return string response.

We have created a mock implementation of EmailSender

public class EmailSenderImpl implements EmailSender {
  @Override
  public CharSequence send(EmailMessage email) throws AvroRemoteException {
    return email.toString();
  }
}

Now, we create a server, Apache Avro uses Netty for the same.

server = new NettyServer(new SpecificResponder(EmailSender.class, new EmailSenderImpl()),
    new InetSocketAddress(65333));

Now, we create a client which sends request to the server.

NettyTransceiver client = new NettyTransceiver(new InetSocketAddress(65333));
// client code - attach to the server and send a message
EmailSender proxy = SpecificRequestor.getClient(EmailSender.class, client);
logger.info("Client built, got proxy");

// fill in the Message record and send it
EmailMessage message = new EmailMessage();
message.setTo(new Utf8(args[0]));
message.setFrom(new Utf8(args[1]));
message.setBody(new Utf8(args[2]));
logger.info("Calling proxy.send with message: {} ", message.toString());
logger.info("Result: {}", proxy.send(message));
// cleanup
client.close();

This is how we can use Apache Avro as RPC framework. I hope you found this article useful. You can download the full example code from Github.

In this post, we will cover following topics.

  • What are Lambda expressions?
  • Syntax for Lambda expression.
  • How to define no parameter Lambda expression?
  • How to define single/ multi parameter Lambda expression?
  • How to return value from Lambda expression?
  • Accessing local variables in Lambda expression.
  • Target typing in Lambda expression.

What are Lambda expressions?

Lambda expressions are the first step of Java towards functional programming. Lambda expressions enable us to treat functionality as method arguments, express instances of single-method classes more compactly.

Syntax for Lambda expression

Lambda has three parts:

  • comma separated list of formal parameters enclosed in parenthesis.
  • arrow token ->.
  • and, body of expression (which may or may not return value).

(param) -> { System.out.println(param); }
Lambda expression can only be used where the type they are matched are functional interfaces.

How to define no parameter Lambda expression?

If the lambda expression is matching against no parameter method, it can be written as:

() -> System.out.println("No paramter expression");

How to define single/ multi parameter Lambda expression?

If lambda expression is matching against method which take one or more parameter, it can be written as:

(param) -> System.out.println("Single param expression: " + param);

(paramX, paramY) -> System.out.println("Two param expression: " + paramX + ", " + paramX);

You can also define the type of parameter in Lambda expression.

(Employee e) -> System.out.println(e);

How to return value from Lambda expression?

You can return value from lambda just like a method did.

(param) -> {
  // perform some steps
  return "some value";
};

In case lambda is performing single step and returning value. Then you can write it as:

//either
(int a, int b) -> return Integer.compare(a, b);

// or simply lambda will automatically figure to return this value
(int a, int b) -> Integer.compare(a, b);

Accessing local variables in Lambda expression

Lambda can access the final or effectively final variables of the method in which they are defined. They can also access the instance variables of enclosing class.

Target typing in Lambda expression

You might have seen in earlier code snippets that we have omitted the type of parameter, return value and the type of Lambda. Java compiler determines the target type from the context lambda is defined.

Compiler checks three things:

  • Is the target type functional interface?
  • Is list of parameter and its type matched with the single method?
  • Does the return type matched with the single method return type?

Now, Let's jump to an example to verify it.

@FunctionalInterface
interface InterfaceA {
  void doWork();
}

@FunctionalInterface
interface InterfaceB<T> {
  T doWork();
}

class LambdaTypeCheck {
  
  public static void main (String[] args) {
    LambdaTypeCheck typeCheck = new LambdaTypeCheck();
    typeCheck.invoke(() -> "I am done with you");
  }
  
  public <T> T invoke (InterfaceB<T> task) {
    return task.doWork();
  }

  public void invoke (InterfaceA task) {
    task.doWork();
  }
}
When you call typeCheck.invoke(() -> "I am done with you"); then invoke(InterfaceB<T> task) will be called. Because the lambda return value which is matched by InterfaceB<T>.

Java 8 reincarnated SAM interfaces and termed them Functional interfaces. Functional interfaces have single abstract method and are eligible to be represented with Lambda expression. @FunctionalInterface annotation is introduced in Java 8 to mark an interface as functional. It ensures at compile-time that it has only single abstract method, otherwise it will throw compilation error.

Let's define a functional interface.

@FunctionalInterface
public interface Spec<T> {
  boolean isSatisfiedBy(T t);
}

Functional interfaces can have default and static methods in them and still remains functional interface.

@FunctionalInterface
public interface Spec<T> {
  boolean isSatisfiedBy(T t);
 
  default Spec<T> not() {
    return (t) -> !isSatisfiedBy(t);
  }
 
  default Spec<T> and(Spec<T> other) {
    return (t) -> isSatisfiedBy(t) && other.isSatisfiedBy(t);
  }
 
  default Spec<T> or(Spec<T> other) {
    return (t) -> isSatisfiedBy(t) || other.isSatisfiedBy(t);
  }
}
If an interface declares an abstract method overriding one of the public methods of java.lang.Object, that also does not count toward the interface's abstract method count since any implementation of the interface will have an implementation from java.lang.Object or elsewhere.

I did a comparison of Java default serialization and Apache Avro serialization of data and results were very astonishing.

You can read my older posts for Java serialization process and Apache Avro Serialization.

Apache Avro consumed 15-20 times less memory to store the serialized data. I created a class with three fields (two String and one enum and serialized them with Avro and Java.

The memory used by Avro is 14 bytes and Java used 231 bytes (length of byte[])

Reason for generating less bytes by Avro

Java Serialization

The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.

Apache Avro

writes only the schema as String and data of class being serialized. There is no per field overhead of writing the class of the object, the class signature as in Java. Also, the fields are serialized in pre-determined order.

You can find the full Java example on github.

Avro can't handle circular references and throw java.lang.StackOverflowError whereas Java's default serialization can handle it. (example code for Avro and example code for Java serialization) Another observation is that Avro have no direct way of defining inheritance in the Schema (Classes) but Java's default serialization support inheritance with its own constraints like super class either need to implements Serializable interface or have default no-args constructor accessible till top hierarchy (otherwise will throw java.io.NotSerializableException).

You can also view my other posts on Avro.

This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema.

In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes.

"Apache Avro™ is a data serialization system." We use DatumReader<T> and DatumWriter<T> for de-serialization and serialization of data, respectively.

Apache Avro formats

Apache Avro supports two formats, JSON and Binary.

Let's move to an example using JSON format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(Employee.getClassSchema(), baos);
  employeeWriter.write(employee, jsonEncoder);
  jsonEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(new String(data));
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder decoder = DecoderFactory.get().jsonDecoder(Employee.getClassSchema(), new String(data));
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

Explanation on the way :)

Line 1: We create an object of class Employee (AVRO generated)

Line 3: We create an object of SpecificDatumWriter<T> which implements DatumWriter<T> Also, there exists other implementation of DatumWriter viz. GenericDatumWriter and ReflectDatumWriter.

Line 6: We create JsonEncoder by passing Schema and OutputStream where we want the serialized data and In our case, it is in-memory ByteArrayOutputStream.

Line 7: We call #write method on DatumWriter with Object and Encoder.

Line 8: We flushed the JsonEncoder. Internally, it flushes the OutputStream passed to JsonEncoder.

Line 15: We created object of SpecificDatumReader<T> which implements DatumReader<T>. Also, there exists other implementation of DatumReader viz. GenericDatumReader and ReflectDatumReader.

Line 16: We create JsonDecoder passing Schema and input String which will be deserialized.

Let's move to serialization and de-serialization example with Binary format.

Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build();

DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class);
byte[] data;
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
  Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(baos, null);
  employeeWriter.write(employee, binaryEncoder);
  binaryEncoder.flush();
  data = baos.toByteArray();
}
  
// serialized data
System.out.println(data);
  
DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class);
Decoder binaryDecoder = DecoderFactory.get().binaryDecoder(data, null);
employee = employeeReader.read(null, decoder);
//data after deserialization
System.out.println(employee);

All the example is same except Line 6 and Line 16 where we are creating an object of BinaryEncoder and BinaryDecoder.

This is how to we can serialize and deserialize data with Apache Avro. I hope you found this article informative and useful. You can find the full example on github.

Iterating Collections API

Java 8 introduced new way of iterating Collections API. It is retrofitted to support #forEach method which accepts Consumer in case of Collection and BiConsumer in case of Map.

Consumer

Java 8 added introduced new package java.util.function which also includes Consumer interface. It represents the operation which accepts one argument and returns no result.

Before Java 8, you would have used for loop, extended for loop and/ or Iterator to iterate over Collections .

List<Employee> employees = EmployeeStub.getEmployees();
Iterator<Employee> employeeItr = employees.iterator();
Employee employee;
while (employeeItr.hasNext()) {
  employee = employeeItr.next();
  System.out.println(employee);
}

In Java 8, you can write Consumer and pass the reference to #forEach method for performing operation on every item of Collection.

// fetch employees from Stub
List<Employee> employees = EmployeeStub.getEmployees();
// create a consumer on employee
Consumer<Employee> consolePrinter = System.out::println;
// use List's retrofitted method for iteration on employees and consume it
employees.forEach(consolePrinter);

Or Just one liner as

employees.forEach(System.out::println);

Before Java 8, you would have iterated Map as

Map<Long, Employee> idToEmployeeMap = EmployeeStub.getEmployeeAsMap();
for (Map.Entry<Long, Employee> entry : idToEmployeeMap.entrySet()) {
  System.out.println(entry.getKey() + " : " + entry.getValue());
}

In Java 8, you can write BiConsumer and pass the reference to #forEach method for performing operation on every item of Map.

BiConsumer<Long, Employee> employeeBiConsumer = (id, employee) -> System.out.println(id + " : " + employee);
Map<Long, Employee> idToEmployeeMap = EmployeeStub.getEmployeeAsMap();
idToEmployeeMap.forEach(employeeBiConsumer);

or Just a one liner:

idToEmployeeMap.forEach((id, employee) -> System.out.println(id + " : " + employee));

This is how we can benefit with newly introduced method for iteration. I hope you found this post informative. You can get the full example on Github.

In this post, we will cover following items.

  • What is java.util.function.Predicate?
  • How to filter data with Predicates?
  • Predicate chaining.

Java 8 introduced many new features like Streaming API, Lambdas, Functional interfaces, default methods in interfaces and many more.

Today, we will discuss about Predicate interface added in java.util.function package and its usage in filtering in-memory data.

What is java.util.function.Predicate?

Predicate is like a condition checker, which accepts one argument of type T and return the boolean value.

It's a functional interface with functional method test(Object). Here, Object is typed.

@FunctionalInterface
interface Predicate<T> {
  public boolean test(T t);
}

How we can filter data with Predicates?

Consider we have Collection of employees and we want to filter them based on age, sex, salary and/ or with any other combinations. We can do that with Predicate.

Let's understand this with one short example.

class Employee {
  private long id;
  private String firstName;
  private String lastName;
  private int age;
  private Sex sex;
  private int salary;

  // getters, constructor, hashCode, equals, to String
}

Defining predicates for filtering

Predicate<Employee> male = e -> e.getSex() == Sex.MALE;
Predicate<Employee> female = e -> e.getSex() == Sex.FEMALE;
Predicate<Employee> ageLessThan30 = e -> e.getAge() < 30;
Predicate<Employee> salaryLessThan20 = e -> e.getSalary() < 20000;
Predicate<Employee> salaryGreaterThan25 = e -> e.getSalary() > 25000;

Filtering employees with Predicates

employees.stream().filter(male).collect(Collectors.toList());
employees.stream().filter(female).collect(Collectors.toList());
employees.stream().filter(ageLessThan30).collect(Collectors.toList());
employees.stream().filter(salaryLessThan20).collect(Collectors.toList());

Here, employees reference is of type java.util.List.

Collections framework is retrofitted for Streaming API and have stream() and parallelStream() methods along with few other additions.filter() method is defined in Stream. We are streaming employees collection and filtering them based on the Predicate and then collecting as java.util.List.

Predicate chaining

java.util.function.Predicate have three default method. Two of them and(Predicate<T> other) and or(Predicate<T> other) is used for predicate chaining.

Filtering employees with multiple predicates

Let's say, we want to filter collection of employees which involves multiple conditions like

  • all male salary less than 20k.
  • all female salary greater than 25k.
  • all male salary either less than 20 k or greater than 25k.

Let's understand this with quick example.

Defining predicates

Predicate<Employee> male = e -> e.getSex() == Sex.MALE;
Predicate<Employee> female = e -> e.getSex() == Sex.FEMALE;
Predicate<Employee> ageLessThan30 = e -> e.getAge() < 30;
Predicate<Employee> salaryLessThan20 = e -> e.getSalary() < 20000;
Predicate<Employee> salaryGreaterThan25 = e -> e.getSalary() > 25000;
Predicate<Employee> salaryLessThan20OrGreateThan25 = salaryLessThan20.or(salaryGreaterThan25);

Predicate<Employee> allMaleSalaryLessThan20 = male.and(salaryLessThan20);
Predicate<Employee> allMaleAgeLessThan30 = male.and(ageLessThan30);
Predicate<Employee> allFemaleSalaryGreaterThan25 = female.and(salaryGreaterThan25);

Predicate<Employee> allMaleSalaryLessThan20OrGreateThan25 = male.and(salaryLessThan20OrGreateThan25);

Line 1 => Predicate test for employee male

Line 2 => Predicate test for employee female

Line 3 => Predicate test for employee age less than 30

Line 4 => Pedicate test for employee salary less than 20000

Line 8 => Predicate test for employee male and salary less than 20000

Line 10 => Predicate test for employee female and salary greater than 25000

Line 12 => Predicate test for employee male and salary either less than 20000 or greater than 25000

Filtering employees with predicate chaining

employees.stream().filter(allMaleSalaryLessThan20).collect(Collectors.toList());
employees.stream().filter(allMaleAgeLessThan30).collect(Collectors.toList());
employees.stream().filter(allFemaleSalaryGreaterThan25).collect(Collectors.toList());
employees.stream().filter(allMaleSalaryLessThan20OrGreateThan25).collect(Collectors.toList());

This is how we can use Predicate to filter in-memory data. I hope you find this post informative and helpful. You can get the full example code on Github.

java.util.function package

Java 8 introduced new package and introduced many functional interface. It can be divided into four categories.

  • Predicate
  • Consumer
  • Function
  • Supplier

Predicate

It represents a boolean-valued function of one argument. It is a functional interface with method test(T) where T is typed.

You can see the usage here.

Consumer

It represents an operation accept(s) argument(s) and return void with side-effects. Java 8 introduced many versions of Consumers.

You can see the usage of Consumer here.

Spring 4.3 - @GetMapping, @PostMapping, @PutMapping and @DeleteMapping

There are some new improvements in Spring Boot 1.4 and Spring 4.3 which lead to a better readability and some use of annotations, particularly with HTTP request methods.

We usually map GET, PUT, POST and DELETE HTTP method in rest controller in the following way.

@RestController
@RequestMapping("/api/employees")
public class EmployeeController {

  @RequestMapping
  public ResponseEntity<List<Employee>> getAll() {
    return ResponseEntity.ok(Collections.emptyList());
  }

  @RequestMapping("/{employeeId}")
  public ResponseEntity<Employee> findById(@PathVariable Long employeeId) {
    return ResponseEntity.ok(EmployeeStub.findById(employeeId));
  }

  @RequestMapping(method = RequestMethod.POST)
  public ResponseEntity<Employee> addEmployee(@RequestBody Employee employee) {
    return ResponseEntity.ok(EmployeeStub.addEmployee(employee));
  }

  @RequestMapping(method = RequestMethod.PUT)
  public ResponseEntity<Employee> updateEmployee(@RequestBody Employee employee) {
    return ResponseEntity.ok(EmployeeStub.updateEmployee(employee));
  }

  @RequestMapping(path = "/{employeeId}", method = RequestMethod.DELETE)
  public ResponseEntity<Employee> deleteEmployee(@PathVariable Long employeeId) {
    return ResponseEntity.ok(EmployeeStub.deleteEmployee(employeeId));
  }
}

But with Spring Framework 4.3 and Spring Boot 1.4, we have new annotations to map the HTTP methods.

  • GET -> @GetMapping
  • PUT -> @PutMapping
  • POST -> @PostMapping
  • DELETE -> @DeleteMapping
  • PATCH -> @PatchMapping
/**
 * 
 * @author Gaurav Rai Mazra
 *
 */
@RestController
@RequestMapping("/api/employees")
public class EmployeeController {

  @GetMapping
  public ResponseEntity<List<Employee>> getAll() {
    return ResponseEntity.ok(Collections.emptyList());
  }

  @GetMapping("/{employeeId}")
  public ResponseEntity<Employee> findById(@PathVariable Long employeeId) {
    return ResponseEntity.ok(EmployeeStub.findById(employeeId));
  }

  @PostMapping
  public ResponseEntity<Employee> addEmployee(@RequestBody Employee employee) {
    return ResponseEntity.ok(EmployeeStub.addEmployee(employee));
  }

  @PutMapping
  public ResponseEntity<Employee> updateEmployee(@RequestBody Employee employee) {
    return ResponseEntity.ok(EmployeeStub.updateEmployee(employee));
  }

  @DeleteMapping(path = "/{employeeId}")
  public ResponseEntity<Employee> deleteEmployee(@PathVariable Long employeeId) {
    return ResponseEntity.ok(EmployeeStub.deleteEmployee(employeeId));
  }
}

These annotations has improved the readability of the code. I hope you find this post helpful. You can get full example code on Github.

This post is in continuation to my older post on Single Responsibility principle. At that time, I provided solution where we refactored FileParser and moved validation logic to FileValidationUtils and also composed Parser interface with various implementation viz. CSVFileParser, XMLFileParser and JsonFileParser (A sort of Strategy Design pattern). and validation code was moved to FileValidationUtils java file. You can get hold of old code on Github.

This was roughly 2 years ago :).

I though of improving this code further. We can completely remove FileValidationUtils by making following code change in Parser interface.

public interface Parser {
  public void parse(File file);

  public FileType getFileType();

  public default boolean canParse(File file) {
    return Objects.nonNull(file) && file.getName().endsWith(getFileType().getExtension());
  }
}
public class FileParser {
  private Parser parser;

  public FileParser(Parser parser) {
    this.parser = parser;
  }

  public void setParser(Parser parser) {
    this.parser = parser;
  }

  public void parseFile(File file) {
    if (parser.canParse(file)) {
      parser.parse(file);
    }
  }
}

We introduce default method in Parser interface ( Java 8 features) which checks if the file could be parsed which could be overridden by the concrete implementations. You can check the full example code on Github.

This post is in continuation to my previous post on Apache Avro - Introduction. In this post, we will discuss about generating classes from Apache Avro schema.

How to generate classes from Apache Avro schema?

There are two ways to generate schema.

  • Using maven Avro plugin
  • Pragmatically generating schema

Consider we have following schema in "src/main/avro"

{
  "type" : "record",
  "name" : "Employee",
  "namespace" : "com.gauravbytes.avro",
  "doc" : "Schema to hold employee object",
  "fields" : [{
    "name" : "firstName",
    "type" : "string"
  },
  {
    "name" : "lastName",
    "type" : "string"
  }, 
  {
    "name" : "sex", 
    "type" : {
      "name" : "SEX",
      "type" : "enum",
      "symbols" : ["MALE", "FEMALE"]
    }
  }]
}

Pragmatically generating classes

Classes can be generated for schema using SchemaCompiler.

public class PragmaticSchemaGeneration {
 private static final Logger LOGGER = LoggerFactory.getLogger(PragmaticSchemaGeneration.class);

 public static void main(String[] args) {
  try {
   SpecificCompiler compiler = new SpecificCompiler(new Schema.Parser().parse(new File("src/main/avro/employee.avsc")));
   compiler.compileToDestination(new File("src/main/avro"), new File("src/main/java"));
  } catch (IOException e) {
   LOGGER.error("Exception occurred parsing schema: ", e);
  }
 }
}

At line number 6, we create the object of SpecificComplier. It has two constructor, one take Protocolas an argument and other take Schema as an argument.

Using Maven plugin to generate schema

There is maven plugin which can generate schema for you. You need to add following configuration to your pom.xml.

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>${avro.version}</version>
  <executions>
    <execution>
      <id>schemas</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
        <goal>protocol</goal>
        <goal>idl-protocol</goal>
      </goals>
      <configuration>
        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>

This is how we can generate classes from Avro schema. I hope you find this post informative and helpful. You can find the full project on Github.

In this post, we will discuss following items

  • What is Apache Avro?
  • What is Avro schema and how to define it?
  • Serialization in Apache Avro.

What is Apache Avro?

"Apache Avro is data serialization library" That's it, huh. This is what you will see when you open their official page.Apache Avro is:

  • Schema based data serialization library.
  • RPC framework (support).
  • Rich data structures (Primary includes null, string, number, boolean and Complex includes Record, Array, Map etc.).
  • A compact, fast and binary data format.

What is Avro schema and how to define it?

Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing.

Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex.

Primitive types

These are the basic type supported by Avro. It includes null, int, long, bytes, string, float and double. One quick example:

{"type": "string"}

Complex types

Apache Avro support six complex types i.e. record, enum, array, map, fixed and union.

RECORD

Record uses the name type 'record' and has following attributes.

  • name: a JSON string, providing the name of the record (required).
  • namespace: A JSON string that qualifies the name.
  • doc: A JSON string representing the documentation for the record.
  • aliases: A JSON array, providing alternate name for the record
  • fields: A JSON array, listing fields (required). It has own attributes.
    • name: A JSON string, providing the name of the field (required).
    • type: A JSON object, defining a schema or record definition (required).
    • doc: A JSON string, providing documentation for the field.
    • default: A default value for the field if the instance lack recognition of the field value.
{
  "type": "record",
  "name": "Node",
  "aliases": ["SinglyLinkedNodes"],
  "fields" : [
    {"name": "value", "type": "string"},
    {"name": "next", "type": ["null", "Node"]}
  ]
}
ENUM

Enum uses the type "enum" and support attributes i.e. name, namespace, aliases, doc and symbols (A JSON array).

{ 
  "type": "enum",
  "name": "Move",
  "symbols" : ["LEFT", "RIGHT", "UP", "DOWN"]
}
ARRAYS

Array uses the type "array" and support single attribute item.

{"type": "array", "items": "string"}
MAPS

Map uses the type "map" and support one attribute values. Its key by default are of type string.

{"type": "map", "values": "long"}
UNIONS

Unions are represented by JSON array as ["null", "string"] which means the value type could be null or string.

FIXED

Fixed uses type "fixed" and support two attributes i.e. name and size.

{"type": "fixed", "size": 16, "name": "md5"}

Serialization in Apache Avro

Apache Avro data is always serialized with its schema. It supports two types of encoding i.e. Binary and JSON . You can read more on serialization on their official specification and/ or can see the example usage here.