Showing posts with label AVRO RPC Framework. Show all posts

This post is in continuation to my previous posts on Apache Avro - Introduction, Apache Avro - Generating classes from Schema and Apache Avro - Serialization.

In this post, we will share insights on using Apache Avro as RPC framework.

We first need to define a protocol to use Apache Avro as RPC framework. Before going into depth of this topic, let's discuss What protocol is?

Avro protocols describes RPC interfaces. They are defined as JSON similar to Schema.

A protocol has following attributes

  • protocol: a string, defining name of the protocol.
  • namespace: an optional that qualifies the name.
  • types: an optional list of definitions of named types (like record, enum, fixed and errors).
  • messages: an optional JSON object whose keys are method names of protocoland whose values are objects whose attributes are described below. No two messages may have the same name.

Further, Message have following attributes

  • request: a list of named, typed parameter schemas.
  • response: a response schema.
  • error: an optional union of declared error schemas.

Let's define a simple protocol to exchange email message between client and server.

{
  "namespace": "com.gauravbytes.avro",
  "protocol": "EmailSender",
   "types": [{
     "name": "EmailMessage", "type": "record",
     "fields": [{
       "name": "to",
       "type": "string"
     },
     {
       "name": "from",
       "type": "string"
     },
     {
       "name": "body",
       "type": "string"
     }]
   }],
   "messages": {
     "send": {
       "request": [{"name": "email", "type": "EmailMessage"}],
       "response": "string"
     }
   }
}

Here, The protocol defines an interface EmailSender which takes an EmailMessage as request and return string response.

We have created a mock implementation of EmailSender

public class EmailSenderImpl implements EmailSender {
  @Override
  public CharSequence send(EmailMessage email) throws AvroRemoteException {
    return email.toString();
  }
}

Now, we create a server, Apache Avro uses Netty for the same.

server = new NettyServer(new SpecificResponder(EmailSender.class, new EmailSenderImpl()),
    new InetSocketAddress(65333));

Now, we create a client which sends request to the server.

NettyTransceiver client = new NettyTransceiver(new InetSocketAddress(65333));
// client code - attach to the server and send a message
EmailSender proxy = SpecificRequestor.getClient(EmailSender.class, client);
logger.info("Client built, got proxy");

// fill in the Message record and send it
EmailMessage message = new EmailMessage();
message.setTo(new Utf8(args[0]));
message.setFrom(new Utf8(args[1]));
message.setBody(new Utf8(args[2]));
logger.info("Calling proxy.send with message: {} ", message.toString());
logger.info("Result: {}", proxy.send(message));
// cleanup
client.close();

This is how we can use Apache Avro as RPC framework. I hope you found this article useful. You can download the full example code from Github.

In this post, we will discuss following items

  • What is Apache Avro?
  • What is Avro schema and how to define it?
  • Serialization in Apache Avro.

What is Apache Avro?

"Apache Avro is data serialization library" That's it, huh. This is what you will see when you open their official page.Apache Avro is:

  • Schema based data serialization library.
  • RPC framework (support).
  • Rich data structures (Primary includes null, string, number, boolean and Complex includes Record, Array, Map etc.).
  • A compact, fast and binary data format.

What is Avro schema and how to define it?

Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing.

Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex.

Primitive types

These are the basic type supported by Avro. It includes null, int, long, bytes, string, float and double. One quick example:

{"type": "string"}

Complex types

Apache Avro support six complex types i.e. record, enum, array, map, fixed and union.

RECORD

Record uses the name type 'record' and has following attributes.

  • name: a JSON string, providing the name of the record (required).
  • namespace: A JSON string that qualifies the name.
  • doc: A JSON string representing the documentation for the record.
  • aliases: A JSON array, providing alternate name for the record
  • fields: A JSON array, listing fields (required). It has own attributes.
    • name: A JSON string, providing the name of the field (required).
    • type: A JSON object, defining a schema or record definition (required).
    • doc: A JSON string, providing documentation for the field.
    • default: A default value for the field if the instance lack recognition of the field value.
{
  "type": "record",
  "name": "Node",
  "aliases": ["SinglyLinkedNodes"],
  "fields" : [
    {"name": "value", "type": "string"},
    {"name": "next", "type": ["null", "Node"]}
  ]
}
ENUM

Enum uses the type "enum" and support attributes i.e. name, namespace, aliases, doc and symbols (A JSON array).

{ 
  "type": "enum",
  "name": "Move",
  "symbols" : ["LEFT", "RIGHT", "UP", "DOWN"]
}
ARRAYS

Array uses the type "array" and support single attribute item.

{"type": "array", "items": "string"}
MAPS

Map uses the type "map" and support one attribute values. Its key by default are of type string.

{"type": "map", "values": "long"}
UNIONS

Unions are represented by JSON array as ["null", "string"] which means the value type could be null or string.

FIXED

Fixed uses type "fixed" and support two attributes i.e. name and size.

{"type": "fixed", "size": 16, "name": "md5"}

Serialization in Apache Avro

Apache Avro data is always serialized with its schema. It supports two types of encoding i.e. Binary and JSON . You can read more on serialization on their official specification and/ or can see the example usage here.