In the previous post, we have setup ELK stack and ran data analytics on application events and logs. In this post, we will discuss how you can watch real-time application events that are being persisted in the Elasticsearch index and raise alerts if condition for watcher is breached using SentiNL (Kibana plugin).

Few examples of alerting for application events (see previous posts) are:

  • Same user logged in from different IP addresses.
  • Different users logged in from same IP address.
  • PermissionFailures in last 15 minutes.
  • Particular kind of exception in last 15 minutes/ hour/ day.

Watching and alerting on Elasticsearch index in Kibana

There are many plugins available for watching and alerting on Elasticsearch index in Kibana e.g. X-Pack, SentiNL.

X-Pack is a paid extension provided by elastic.co which provides security, alerting, monitoring, reporting and graph capabilities.

SentiNL is free extension provided by siren.io which provides alerting and reporting functionality to monitor, notify and report changes in elasticsearch index using standard queries, programmable validators and configurable actions.

We will be using SentiNL for watching and alerting on Elasticsearch index.

Installing SentiNL

Prerequisite

For debian, we need libfontconfig and libfreetype6 libraries, if not installed already.

sudo apt-get install libfontconfig libfreetype6

For centos, we need fontconfig and freetype libraries, if not installed already.

sudo yum install fontconfig freetype

// Installing SentiNL plugin
/opt/kibana/bin/kibana-plugin --install sentinl -u https://github.com/sirensolutions/sentinl/releases/download/tag-4.6.4-4/sentinl.zip

Configuring SentiNL

SentiNL have wide range of actions that you can configure for watchers. You can send an email, integrate with Slack channel or pushapps, send payload to custom webhook. Open kibana.yml file and add below properties for SentiNL. For our example, we will only enable notification through email.

sentinl:
  es:
    host: 'localhost'
    port: 9200
  settings:
    email:
      active: true
      host: "smtp.gmail.com"
      user: "[EMAIL_ID]"
      password: "[PASSWORD]"
      port: 465
      domain: "gmail.com"
      ssl: true
      tls: false
      authentication: ['PLAIN', 'LOGIN', 'CRAM-MD5', 'XOAUTH2']
      timeout: 20000  # mail server connection timeout
      # cert:
      #   key: '/full/sys/path/to/key/file'
      #   cert: '/full/sys/path/to/cert/file'
      #   ca: '/full/sys/path/to/ca/file'
    slack:
      active: false
      username: 'username'
      hook: 'https://hooks.slack.com/services/'
      channel: '#channel'
    webhook:
      active: false
      host: 'localhost'
      port: 9200
      # use_https: false
      # path: ':/{{payload.watcher_id}}'
      # body: '{{payload.watcher_id}}{payload.hits.total}}'
      # method: POST
    report:
      active: false
      executable_path: '/usr/bin/chromium' # path to Chrome v59+ or Chromium v59+
      timeout: 5000
      # authentication:
      #   enabled: true
      #   mode:
      #     searchguard: false
      #     xpack: false
      #     basic: false
      #     custom: true
      #   custom:
      #     username_input_selector: '#username'
      #     password_input_selector: '#password'
      #     login_btn_selector: '#login-btn'
      # file:
      #   pdf:
      #     format: 'A4'
      #     landscape: true
      #   screenshot:
      #     width: 1280
      #     height: 900
    pushapps:
      active: false
      api_key: ''
That's it!!! Let's start Kibana to configure watcher and alerting in SentiNL.

Creating Watchers and alerting in Kibana

We will be configuring watcher for different users logged in from same IP address and will send e-Mail alerts.

  • Open Kibana dashboard on your local machine (Url for Kibana on my local machine is http://localhost:5601).
  • Click on SentiNL option in the left nav-pane. You will see a dashboard as below. Click on the New option to create a new watcher.
  • Click on the Watcher link highlighted as below.
  • Enter watcher name and schedule in the General tab.
  • Click on Input tab and enter below mentioned query json in the body. You can also give a name to the query and save.
    {
      "search": {
        "request": {
          "index": [
            "app-events*"
          ],
          "body": {
            "query": {
              "bool": {
                "filter": [
                  {
                    "range": {
                      "@timestamp": {
                        "gte": "now-30m"
                      }
                    }
                  },
                  {
                    "query_string": {
                      "default_field": "appEvent.eventType",
                      "query": "LOGIN_SUCCESS OR LOGIN_FAILURE"
                    }
                  }
                ]
              }
            },
            "aggs": {
              "group_by_requestIP": {
                "terms": {
                  "field": "appEvent.requestIP.keyword",
                  "size": 5
                },
                "aggs": {
                  "group_by_identifier": {
                    "terms": {
                      "field": "appEvent.identifier.keyword",
                      "size": 5
                    },
                    "aggs": {
                      "get_latest": {
                        "terms": {
                          "field": "@timestamp",
                          "size": 1,
                          "order": {
                            "_key": "desc"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    
  • Click on Condition tab and enter below mentioned condition json in the body. You can also give a name to this condition and save.
    {
      "script": {
        "script": "var requestIPbuckets = payload.aggregations.group_by_requestIP.buckets; payload.collector = []; requestIPbuckets.filter(function(requestIP) { return requestIP.key; }).forEach(function(requestIP) { var requestIPKey = requestIP.key; var users = requestIP.group_by_identifier.buckets; if (users.length > 1) { users.filter(function(user) { return user.key; }).forEach(function(user) { payload.collector.push({ 'ip': requestIPKey, 'identifier': user.key, 'count': user.doc_count  }); }); }}); payload.collector.length > 0;"
      }
    }
    
  • Click on Action tab and select email as an action for alerting. Give title, to, from, subject and add below mentioned content in the body of email.
    Found {{payload.collector.length}} Events
    {{#payload.collector}}
    {{#.}}
    ip : {{ip}}, identifier: {{identifier}}, count: {{count}}
    {{/.}}
    {{/payload.collector}}
    
  • Save the watcher.

This watcher will run periodically based on the schedule that you have set and if the condition for breach is met, will send an email alert. The configured email looks like below.

This is how you can watch real-time changing data in Elasticsearch index and raise alerts based on the configured conditions.

In this post, we will learn how to use Elasticsearch, Logstash and Kibana for running analytics on application events and logs. Firstly, I will install all these applications on my local machine.

Installations

You can read my previous posts on how to install Elasticsearch, Logstash, Kibana and Filebeat on your local machine.

Basic configuration

I hope by now you are have installed Elasticsearch, Logstash, Kibana and Filebeat on your system. Now, Let's do few basic configurations required to be able to run analytics on application events and logs.

Elasticsearch

Open elasticsearch.yml file in [ELASTICSEARCH_INSTLLATION_DIR]/config folder and add properties to it.

cluster.name: gauravbytes-event-analyzer
node.name: node-1

Cluster name is used by Elasticsearch node to form a cluster. Node name within cluster need to be unique. We are running only single instance of Elasticsearch on our local machine. But, in production grade setup there will be master nodes, data nodes and client nodes that you will be configuring as per your requirements.

Logstash

Open logstash.yml file in [LOGSTASH_INSTALLATION_DIR]/config folder and add below properties to it.

node.name: gauravbytes-logstash
path.data: [MOUNTED_HDD_LOCATION]
config.reload.automatic: true
config.reload.interval: 30s

Creating logstash pipeline for parsing application events and logs

There are three parts in pipeline. i.e. input, filter and output. Below the pipeline conf for parsing application event and logs.

input {
    beats {
        port => "5044"
    }
}

filter {
   
    grok {
        match => {"message" => "\[%{TIMESTAMP_ISO8601:loggerTime}\] *%{LOGLEVEL:level} *%{DATA:loggerName} *- (?(.|\r|\n)*)"}
    }
 
    if ([fields][type] == "appevents") {
        json {
            source => "event"
            target => "appEvent"
        }
  
        mutate { 
            remove_field => "event"
        }

        date {
            match => [ "[appEvent][eventTime]" , "ISO8601" ]
            target => "@timestamp"
        }
  
        mutate {
            replace => { "[type]" => "app-events" }
        }
    }
    else if ([fields][type] == "businesslogs") {  
        mutate {
            replace => { "[type]" => "app-logs" }
        }
    }
 
    mutate { 
        remove_field => "message"
    }
}
output {
    elasticsearch {
        hosts => ["http://localhost:9200"]
        index => "%{type}-%{+YYYY.MM.dd}"
    }
}

In the input section, we are listening on port 5044 for beat (filebeat to send data on this port).

In the output section, we are persisting data in Elasticsearch on an index based on type and date combination.

Let's discuss the filter section in detail.

  • 1) We are using grok filter plugin to parse plain lines of text to structured data.
    grok {
        match => {"message" => "\[%{TIMESTAMP_ISO8601:loggerTime}\] *%{LOGLEVEL:level} *%{DATA:loggerName} *- (?(.|\r|\n)*)"}
    }
    
  • 2) We are using json filter plugin to the convert event field to a json object and storing it in appEvent field.
    json {
        source => "event"
        target => "appEvent"
    }
    
  • 3) We are using mutate filter plugin to the remove data we don't require.
    mutate { 
        remove_field => "event"
    }
    
    mutate { 
        remove_field => "message"
    }
    
  • 4) We are using date filter plugin to the parse the eventTime from appEvent field to ISO8601 dateformat and then replacing its value with @timestamp field..
    date {
        match => [ "[appEvent][eventTime]" , "ISO8601" ]
        target => "@timestamp"
    }
    

Filebeat

Open the file filebeat.yml in [FILEBEAT_INSTALLATION_DIR] and below configurations.

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - E:\gauravbytes-log-analyzer\logs\AppEvents.log
  fields:
    type: appevents
  
- type: log
  enabled: true
  paths:
    - E:\gauravbytes-log-analyzer\logs\GauravBytesLogs.log
  fields:
    type: businesslogs
  multiline.pattern: ^\[
  multiline.negate: true
  multiline.match: after

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 3

output.logstash:
  hosts: ["localhost:5044"]

In the configurations above, we are defining two different type of filebeat prospectors; one for application events and the other for application logs. We have also defined that the output should be sent to logstash. There are many other configurations that you can do by referencing filebeat.reference.yml file in the filebeat installation directory.

Kibana

Open the kibana.yml in [KIBANA_INSTALLATION_DIR]/config folder and add below configuration to it.

elasticsearch.url: "http://localhost:9200"

We have only configured Elasticsearch url but you can change Kibana host, port, name and other ssl related configurations.

Running ELK stack and Filebeat

//running elasticsearch on windows
\bin\elasticsearch.exe

// running logstash
bin\logstash.bat -f config\gauravbytes-config.conf --config.reload.automatic

//running kibana
bin\kibana.bat

//running filebeat
filebeat.exe -e -c filebeat-test.yml -d "publish"

Creating Application Event and Log structure

I have created two classes AppEvent.java and AppLog.java which will capture information related to application events and logs. Below is the structure for both the classes.

//AppEvent.java
public class AppEvent implements BaseEvent<AppEvent> {
    public enum AppEventType {
        LOGIN_SUCCESS, LOGIN_FAILURE, DATA_READ, DATA_WRITE, ERROR;
    }

    private String identifier;
    private String hostAddress;
    private String requestIP;
    private ZonedDateTime eventTime;
    private AppEventType eventType;
    private String apiName;
    private String message;
    private Throwable throwable;
}

//AppLog.java
public class AppLog implements BaseEvent<AppLog> {
    private String apiName;
    private String message;
    private Throwable throwable;
}

Let's generate events and logs

I have created a sample application to generate dummy events and logs. You can check out the full project on github. There is a AppEventGenerator java file. Run this class with system argument -DLOG_PATH=[YOUR_LOG_DIR] to generate dummy events. If your log_path is not same as one defined in the filebeat-test.yml, then copy the log files generated by this project to the location defined in the filebeat-test.yml. You soon see the events and logs got persisted in the Elasticsearch.

Running analytics on application events and logs in Kibana dashboard

Firstly, we need to define Index pattern in Kibana to view the application events and logs. Follow step by step guide below to create Index pattern.

  • Open Kibana dashboard by opening the url (http://localhost:5601/).
  • Go to Management tab. (Left pane, last option)
  • Click on Index Patterns link.
  • You will see already created index, if any. On the left side, you will see Option to Create Index pattern. Click on it.
  • Now, define index pattern and Click next. Choose time filter field name. I choose @timestamp field for this. You can select any other timestamp field present in this Index and finally click on Create index pattern button.

Let's view Kibana dashboard

Once Index pattern is created, click on Discover tab on the left pane and select index pattern created by you in the previous steps.

You will see a beautiful GUI with a lot of options to mine the data. On the top most pane, you will see option to Auto refresh and data that you would want to fetch (Last 15 minutes, 30 minutes, 1 hour, 1 day and so on) and it will automatically refresh the dashboard.

The next lane has search box. You can further write queries to have more granular view of the data. It uses Apache Lucene's query syntax.

You can also define filters to have a more granular view of data.

This is how you can run the analytics using ELK on your application events and logs. You can also define complex custom filters, queries and create visualization dashboard. Feel free to explore Kibana's official documentation to use it to its full potential.

Java 8 introduced default and static methods in interfaces. These features allow us to add new functionality in the interfaces without breaking the existing contract for implementing classes.

How do we define default and static methods?

Default method has default and static method has static keyword in the method signature.

public interface InterfaceA {
  double someMethodA();

  default double someDefaultMethodB() {
    // some default implementation
  }
  
  static void someStaticMethodC() {
    //helper method implementation 
  }

Few important points for default method

  • You can inherit the default method.
  • You can redeclare the default method essentially making it abstract.
  • You can redefine the default method (equivalent to overriding).

Why do we need default and static methods?

Consider an existing Expression interface with existing implementation like ConstantExpression, BinaryExpression, DivisionExpression and so on. Now, you want to add new functionality of returning the signum of the evaluated result and/or want to get the signum after evaluating the expression. This can be done with default and static methods without breaking any functionality as follows.

public interface Expression {
  double evaluate();

  default double signum() {
    return signum(evaluate());
  }

  static double signum(double value) {
    return Math.signum(value);
  }
}

You can find the full code on Github.

Default methods and multiple inheritance ambiguity problem

Java support multiple inheritance of interfaces. Consider you have two interfaces InterfaceA and InterfaceB with same default method and your class implements both the interfaces.

interface InterfaceA {
  void performA();
  default void doSomeWork() {
  
  }
}

interface InterfaceB {
  void performB();

  default void doSomeWork() {
 
  }
}

class ConcreteC implements InterfaceA, InterfaceB {

}

The above code will fail to compile with error: unrelated defaults for doSomeWork() from InterfaceA and InterfaceB.

To overcome this problem, you need to override the default method.

class ConcreteC implements InterfaceA, InterfaceB {
  override
  public void doSomeWork() {

  }
}
If you don't want to provide implementation of overridden default method but want to reuse one. That is also possible with following syntax.
class ConcreteC implements InterfaceA, InterfaceB {
  override
  public void doSomeWork() {
    InterfaceB.super.doSomeWork();
  }
}

I hope you find this post informative and useful. Comments are welcome!!!.

Filebeat

Filebeat is a light-weight log shipper. It is installed as a agent and listen to your predefined set of log files and locations and forward them to your choice of sink (Logstash, Elasticsearch, database etc.)

Installation

deb
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-amd64.deb
sudo dpkg -i filebeat-6.3.2-amd64.deb
rpm
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-x86_64.rpm
sudo rpm -vi filebeat-6.3.2-x86_64.rpm
mac
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-darwin-x86_64.tar.gz
tar xzvf filebeat-6.3.2-darwin-x86_64.tar.gz
docker
docker pull docker.elastic.co/beats/filebeat:6.3.2
Windows

Download the filebeat from official website and do the following configurations.

1) Extract the zip file to your choice of location. e.g. C:\Program Files.
2) Rename the filebeat--windows directory to Filebeat.
3) Open a PowerShell prompt as an Administrator (right-click the PowerShell icon and select Run As Administrator).
4) From the PowerShell prompt, run the following commands to install Filebeat as a Windows service:

// Command to execute from powershell
cd 'C:\Program Files\Filebeat'
.\install-service-filebeat.ps1

Kibana

Kibana is a visualization dashboard for Elasticsearch and you can choose many available charts like graphs, pie, bar, histogram etc. or real time textual data and can gain meaningful analytics.

Installation

Installating Kibana directly from tar files

For Linux installation

wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.3-linux-x86_64.tar.gz
shasum -a 512 kibana-6.2.3-linux-x86_64.tar.gz 
tar -xzf kibana-6.2.3-linux-x86_64.tar.gz
cd kibana-6.2.3-linux-x86_64/

For Windows installation

//Dowload Kibana
https://artifacts.elastic.co/downloads/kibana/kibana-6.2.3-windows-x86_64.zip

//running kibana
/bin/kibana.bat

Installation from packages

Debian package installation

// Import elatic PGP key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

//install https transport module
sudo apt-get install apt-transport-https

//save repository definition
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

//installation command
sudo apt-get update && sudo apt-get install kibana

rpm package installation

//Download and install public signing key
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Add the following in a new .repo file in your /etc/yum.repos.d/ directory

[kibana-6.x]
name=Kibana repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
//Installation command
sudo yum install kibana 
sudo dnf install kibana 
sudo zypper install kibana 

Logstash

Logstash is data processing pipeline which ingests the data simultaneously from multiple data sources, transform it and send it to different `stash` i.e. Elasticsearch, Redis, database, rest endpoint etc. For example; Ingesting logs files; cleaning and transforming it to machine and human readable formats.

There are three components in Logstash i.e. Inputs, Filters and Outputs

Inputs

It ingests data of any kind, shape and size. For examples: Logs, AWS metrics, Instance health metrics etc.

Filters

Logstash filters parse each event, build a structure, enrich the data in event and also transform it to desired form. For example: Enriching geo-location from IP using GEO-IP filter, Anonymize PII information from events, transforming unstructured data to structural data using GROK filters etc.

Outputs

This is the sink layer. There are many output plugins i.e. Elasticsearch, Email, Slack, Datadog, Database persistence etc.

Installing Logstash

As of writing Logstash(6.2.3) requires Java 8 to run. To check the java version run the following command

java -version

The output on my system is

java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

If Java 8 is not installed then please download it from Oracle website and follows instruction for installation. Also, set the JAVA_HOME variable.

Installing from binaries

You can directly download the binaries from here.

Installing from package repositories

Installation with APT

//ADD PUBLIC SIGNING KEY
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

//add https-transports
sudo apt-get install apt-transport-https

//save the repository definition
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

//installation command
sudo apt-get update && sudo apt-get install logstash

Installation with YUM

// Download and install the public signing key
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Add the following in a new .repo file in your /etc/yum.repos.d/ directory

[logstash-6.x]
name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
// Installation command
sudo yum install logstash

Docker installation

You can follow the link for docker installation.

What is Elasticsearch?

Elasticsearch is highly scalable, broadly distributed open-source full text search and analytics engine. You can in very near real-time search, store and index big volume of data. It internally use Apache Lucene for indexing and storing data. Below are few use cases for it.

  • Product search for e-commerce website
  • Collecting application logs and transaction data for analyzing it for trends and anomalies.
  • Indexing instance metrics(health, stats) and doing analytics, creating alerts for instance health on regular interval.
  • For analytics/ business-intelligence applications

Elasticsearch basic concepts

We will be using few terminologies while talking about Elasticsearch. Let's see basic building blocks of Elasticsearch.

Near real-time

Elasticsearch is near real-time. What it means is that the time (latency) between the indexing of document and its availability for searching.

Cluster

It is a collection of one or multiple nodes (servers) that together holds the entire data and provide you the ability to indexing and searching the cluster for data.

Node

It is a single server that is part of your cluster. It can store data, participate in indexing and searching and overall cluster management. Node could have four different flavours i.e. master, htttp, data, coordinating/client nodes.

Index

An index is collection of similar kind/characteristics of documents. It is identified by name(all lowercase) and is refer to by name to perform indexing, search, update and delete operations against documents.

Document

It is a single unit of information that can be indexed.

Shards and Replicas

Single index can store billions of documents which can lead to storage taking up TB's of space. Single server could exceed its limitation to store such a massive information or performing search operation on that data. To solve this problem, Elasticsearch sub-divide your index into multiple units called shards.

Replication is important primarily to have high availability in case of node/shard failure and to allow to scale out your search throughput. By default Elasticsearch have 5 shards and 1 replicas which could be configured at the time of creating index.

Installing Elasticsearch

Elasticsearch requiresJava to run. As of writing this article Elasticsearch 6.2.X+ requires at least Java 8.

Installing Java 8
// Installing Open JDK
sudo apt-get install openjdk-8-jdk
 
// Installing Oracle JDK
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
sudo apt-get -y install oracle-java8-installer
Installing Elasticsearch with tar file

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz

tar -xvf elasticsearch-6.2.4.tar.gz
Installing Elasticsearch with package manager
// import the Elasticsearch public GPG key into apt:
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

//Create the Elasticsearch source list
echo "deb http://packages.elastic.co/elasticsearch/6.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-6.x.list
  
sudo apt-get update
  
sudo apt-get -y install elasticsearch
Configuring Elasticsearch cluster

Configuration file location if you have downloaded the tar file

vi /[YOUR_TAR_LOCATION]/config/elasticsearch.yml

Configuration file location if you used package manager to install Elasticsearch

vi /etc/elasticsearch/elasticsearch.yml
Cluster Name

Use some descriptive name for cluster. Elasticsearch node will use this name to form and join cluster.

cluster.name: lineofcode-prod
Node name

To uniquely identify node in the cluster

node.name: ${HOSTNAME}
Custom attributes to node

Adding a rack to node to logically group the nodes placed on same data center/ physical machine

node.attr.rack: us-east-1
Network host

Node will bind to this hostname or IP address and advertise this host to other nodes in the cluster.

network.host: [_VPN_HOST_, _local_]
Elasticsearch does not come with authentication and authorization. So, it is suggested to never bind network host property to public IP address.
Cluster finding settings

To find and join a cluster, you need to know at least few other hostname or IP addresses. This could easily be set by discovery.zen.ping.unicast.hosts proeprty.

Changing the http port

You can configure the port number on which Elasticsearch is accessible over HTTP with http.port property.

Configuring JVM options (Optional for local/test)

You need to tweak JVM options as per your hardware configuration. It is advisable to allocate half the memory of total server available memory to Elasticsearch and rest will be taken up by Lucene and Elasticsearch threads.

// For example if your server have eight GB of RAM then set following property as
-Xms4g
-Xmx4g

Also, to avoid performance hit let elasticsearch block the memory with bootstrap.memory_lock: true property.

Elasticsearch uses concurrent mark and sweep GC and you can change it to G1GC with following configurations.

-XX:-UseParNewGC
-XX:-UseConcMarkSweepGC
-XX:+UseCondCardMark
-XX:MaxGCPauseMillis=200
-XX:+UseG1GC
-XX:GCPauseIntervalMillis=1000
-XX:InitiatingHeapOccupancyPercent=35
Starting Elasticsearch
sudo service elasticsearch restart

TADA! Elasticsearch is up and running on your local.

To have a production grade setup, I would recommend to visit following articles.

Digitalocean guide to setup production elasticsearch

Elasticsearch - Fred Thoughts

We have learnt about What is Apache Ignite?, Setting up Apache Ignite and few quick examples in last few posts. In this post, we will deep dive into Apache Ignite core Ignite classes and discuss about following internals.

  • Core classes
  • Lifecycle events
  • Client and Server mode
  • Thread pools configurations
  • Asynchronous support in Ignite
  • Resource injection

Core classes

Whenever you will be interacting with Apache Ignite in application, you will always encounter Ignite interface and Ignition class. Ignition is the main entry point to create a Ignite node. This class provides various methods to start a grid node in the network topology.

// Starting with default configuration
Ignite igniteWithDefaultConfig = Ignition.start();

// Ignite with Spring configuration xml file
Ignite igniteWithSpringCfgXMLFile = Ignition.start("/path_to_spring_configuration_xml.xml");

// ignite with java based configuration
IgniteConfiguration icfg = ...;
Ignite igniteWithJavaConfiguration = Ignition.start(icfg);

There are also other useful methods in Ignition class which we will discuss below. Ignite interface provide control over node. It has various methods to interact as data-grid, service-grid, compute-grid, schedular and many more.

Lifecycle events

Apache Ignite provides four LifecyleEvents i.e. BEFORE_NODE_START, AFTER_NODE_START, BEFORE_NODE_STOP and AFTER_NODE_STOP. It provide hook to tap these events. You need to implement LifecycleBean and set the implementation in the ignite configuration.

class IgniteLifecycleEventListener implements LifecycleBean {

    @Override
    public void onLifecycleEvent(LifecycleEventType evt) throws IgniteException {
        String message;
        switch (evt) {
            case BEFORE_NODE_START:
                message = "before_node_start event is called!";
                break;
            case AFTER_NODE_START:
                message = "after_node_start event is called!";
                break;
            case BEFORE_NODE_STOP:
                message = "before_node_stop event is called!";
                break;
            case AFTER_NODE_STOP:
                message = "after_node_stop event is called!";
                break;
            default:
                message = "Unknown event";
                break;
        }
        System.out.println(message);
    }
}

Client and Server mode

Apache Ignite node can be run in client or server mode. Server nodes participates in Computing, Caching, data grid, service grid etc. and client nodes are way to interact with server nodes to have near time caching, transaction, computing, service grid functionality. You need to explicitly define the client and server mode.

IgnitionConfiguration.setClientMode(...);

Ignition.setClientMode(...);

Thread pool configurations

System thread pool

It processes all cache related operations except SQL and some other queries and also handles computing cancellation tasks.

IgniteConfiguration.setSystemThreadPoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Public thread pool

All computations are received by processed in this thread pool.

IgnitionConfiguration.setPublicThreadPoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Queries pool

Handles the SQL queries and SCAN operation executed across the cluster.

IgnitionConfiguration.setQueryThreadPoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Services Pool

Handles service-grid calls.

IgniteConfiguration.setServicePoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Striped Pool

Accelerate basic caching operations and transactions by spreading execution on multiples stripes that don't contend with each other.

IgniteConfiguration.setStripedPoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Data stream pool

Used in data streaming.

IgniteConfiguration.setDataStreamerPoolSize(...);
//By default it has size equals to max(8, total_no_of_cores)

Custom thread pool

You can define your own custom thread pools. These are used in compute grid. For example, you want to run another task from compute grid task and you also want to avoid the deadlocks. This could be done with custom thread pools synchronously.

IgniteConfiguration icfg = ...;
icfg.setExecutorConfiguration(new ExecutorConfiguration("myCustomThreadPool").setSize(16));
class InternalTask implements IgniteRunnable {
    private static final long serialVersionUID = 5169676352276118235L;
    
    @Override
    public void run() {
        System.out.println("Internal task executed!");
    }
}

class OuterTask implements IgniteRunnable {
    private static final long serialVersionUID = 602712410415356484L;

    @IgniteInstanceResource
    private Ignite ignite;
  
    @Override
    public void run() {
        System.out.println("Ignite Outer task!");
        ignite.compute().withExecutor("myCustomThreadPool").run(new InternalTask());
    }
  
}

// Ignite main example class
IgniteConfiguration icfg = defaultIgniteCfg("custom-thread-pool-grid");
icfg.setExecutorConfiguration(new ExecutorConfiguration("myCustomThreadPool").setSize(16));
  
try (Ignite ignite = Ignition.start(icfg)) {
    ignite.compute().run(new OuterTask());
}

Asynchronous support in Ignite

Ignite API comes with synchronous and asynchronous support. Asynchronous calls return IgniteFuture or one of its implementations. You can call the blocking get method to get value or can add listener(IgniteInClosure) which will get executed as soon as the IgniteFuture has the result.

IgniteCompute compute = ignite.compute();
IgniteFuture fut = compute.callAsync(() -> "Hello from Callable");
//blocking call
String result = fut.get();
//added listener to future which will get executed as soon as future has result.
fut.listener(f -> System.out.println(f.get());
If the IgniteFuture is already have the result from asynchronous operation by the time IgniteInClosure is passed to listen or chain method, then it will be executed synchronously with the caller thread. Otherwise closure will get executed when the asynchronous operation finishes. The closure will be called in system thread pool for asynchronous cache related operations or public thread pool in case of compute operations. So, it is recommended(at least avoid) calling cache/ compute related operations from the closure to avoid deadlocks due to thread starvations.

Resource Injection

Ignite support dependency injection of pre-defined resources which could be used in the task, jo, closure or SPI. It supports both field and method based injection.

IgniteRunnable task = new IgniteRunnable() {
    private static final long serialVersionUID = 787726700536869271L;

    @IgniteInstanceResource
    private transient Ignite ignite;
    
    @Override
    public void run() {
        System.out.println("Hello Gaurav Bytes from: " + ignite.name());
     
    }
};

In the above example code, we have used @IgniteInstanceResource annotation to inject current Ignite instance in the IgniteRunnable object. There are other pre-defined resources that you can inject in the jobs, tasks, closures and SPI.

Resource Name Description
@IgniteInstanceResource Injects current instance of Ignite API
@CacheNameResource Injects the grid-cache name provided by the CacheConfiguration.getName()
@CacheStoreSessionResource Injects the CacheStoreSession instance
@LoadBalancerResource Injects the ComputeLoadBalancer instance for load-balancing
@SpringApplicationContextResource Injects the Spring's ApplicationContext

Apart from this, there are few other resources like TaskContinuousMapperResource, TaskSessionResource, SpringResource, ServiceResource and JobContextResource.

In this article, we will show few examples on using Apache Ignite as Compute Grid, Data Grid, Service Grid and executing SQL queries on Apache Ignite. These are basic examples and use the basic api available. There will be few posts in near future which explains the available API in Compute Grid, Service Grid and Data Grid.

Ignite SQL Example

Apache Ignite comes with JDBC Thin driver support to execute SQL queries on the In memory data grid. In the example below, we will create tables, insert data into tables and get data from tables. I will assume that you are running Apache Ignite on your local environment otherwise please read setup guide for running Apache Ignite server.

Creating Tables
try (Connection conn = DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1/");
     Statement stmt = conn.createStatement();) {
    //line 1
    stmt.executeUpdate("CREATE TABLE City (id LONG PRIMARY KEY, name VARCHAR) WITH \"template=replicated\"");

    //line 2
    stmt.executeUpdate("CREATE TABLE Person (id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id)) WITH \"backups=1, affinityKey=city_id\"");

    stmt.executeUpdate("CREATE INDEX idx_city_name ON City (name)");

    stmt.executeUpdate("CREATE INDEX idx_person_name ON Person (name)");
}

In line 1, we are creating a City table with CacheMode as replicated which means it will be replicated on whole cluster. There are three possible values for CacheMode which is LOCAL, REPLICATED and PARTITIONED. We will discuss about this later in detail.

In line 2, we are creating Person table. You might have noticed affinityKey being used. The purpose of affinityKey is to collate the data together.

Inserting data in tables
try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO City (id, name) VALUES (?, ?)")) {

    stmt.setLong(1, 1L);
    stmt.setString(2, "Forest Hill");
    stmt.executeUpdate();

    stmt.setLong(1, 2L);
    stmt.setString(2, "Denver");
    stmt.executeUpdate();

    stmt.setLong(1, 3L);
    stmt.setString(2, "St. Petersburg");
    stmt.executeUpdate();
}

try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO Person (id, name, city_id) VALUES (?, ?, ?)")) {

    stmt.setLong(1, 1L);
    stmt.setString(2, "John Doe");
    stmt.setLong(3, 3L);
    stmt.executeUpdate();

    stmt.setLong(1, 2L);
    stmt.setString(2, "Jane Roe");
    stmt.setLong(3, 2L);
    stmt.executeUpdate();

    stmt.setLong(1, 3L);
    stmt.setString(2, "Mary Major");
    stmt.setLong(3, 1L);
    stmt.executeUpdate();

    stmt.setLong(1, 4L);
    stmt.setString(2, "Richard Miles");
    stmt.setLong(3, 2L);
    stmt.executeUpdate();
}
Querying data from tables
try (Connection conn = DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1/");
     Statement stmt = conn.createStatement()) {
    try (ResultSet rs = stmt.executeQuery("SELECT p.name, c.name FROM Person p, City c WHERE p.city_id = c.id")) {
        while (rs.next())
            System.out.println(rs.getString(1) + ", " + rs.getString(2));
        }
    }
}

You can find the full example code here.

Ignite Compute Grid Example

In this example, we will use Ignite's compute grid to fetch data.

try (Ignite ignite = Ignition.start(defaultIgniteCfg("cache-reading-compute-engine"))) {
    long cityId = 1;

    ignite.compute().affinityCall("SQL_PUBLIC_CITY", cityId, new IgniteCallable<List<String>>() {
        private static final long serialVersionUID = -131151815825938052L;

        @IgniteInstanceResource
        private Ignite currentIgniteInstance;

        @Override
        public List<String> call() throws Exception {
            List<String> names = new ArrayList<>();
            IgniteCache<BinaryObject, BinaryObject> personCache = currentIgniteInstance.cache("SQL_PUBLIC_PERSON").withKeepBinary();
       
            IgniteBiPredicate<BinaryObject, BinaryObject> filter = (BinaryObject key, BinaryObject value) -> {
                return key.hasField("CITY_ID") && key.<Long>field("CITY_ID") == cityId;
            };

            ScanQuery<BinaryObject, BinaryObject> query = new ScanQuery<>(filter);

            try (QueryCursor<Entry<BinaryObject, BinaryObject>> cursor = personCache.query(query)) {
                Iterator<Entry<BinaryObject, BinaryObject>> itr = cursor.iterator();

                while (itr.hasNext()) {
                    Entry<BinaryObject, BinaryObject> cache = itr.next();
                    names.add(cache.getValue().<String>field("NAME"));
                }

            }
            return names;
         }
     }).forEach(System.out::println);;
}

In this example, we are getting list of person residing in same city. We are calling compute grid on SQL_PUBLIC_CITY cache to query with affinitykey cityId and the IgniteCallable task. In the IgniteCallable task, we have @IgniteInstanceResource which will be injected by the Ignite server running this task.

Ignite Data example

This example will usage of Ignite as in memory data grid.

try (Ignite ignite = Ignition.start(defaultIgniteCfg("ignite-data-grid"))) {
    IgniteCache personCache = ignite.getOrCreateCache("personCache");
    for (int i = 0; i < 10; i++) {
        personCache.put(i, "Gaurav " + i);
    }
   
    for (int i = 0; i < 10; i++) {
        System.out.println(personCache.get(i));
    }
}

Ignite Service grid example

interface TimeService extends Service {
    public LocalDateTime currentDateTime();
}
 
static class TimeServiceImpl implements TimeService {
    private static final long serialVersionUID = 3977097368864906176L;

    @Override
    public void cancel(ServiceContext ctx) {
        System.out.println("Service is cancelled!");
    }

    @Override
    public void init(ServiceContext ctx) throws Exception {
        System.out.println("Service is initialized!");
    }

    @Override
    public void execute(ServiceContext ctx) throws Exception {
        System.out.println("Service is deployed!");
    }

    @Override
    public LocalDateTime currentDateTime() {
        return LocalDateTime.now();
    }
}

try (Ignite ignite = Ignition.start(defaultIgniteCfg("ignite-service-grid"))) {
    ignite.services().deployClusterSingleton("timeServiceImpl", new TimeServiceImpl());
   
    TimeService timeService = ignite.services().service("timeServiceImpl");
   
    System.out.println("Current time is: " + timeService.currentDateTime());
}

If you want to deploy some service on grid than it should implement Service interface. Also, service grid deployments are not zero deployments. You need to put the compiled jars to the Ignite server instance and than need to restart the instance as well.

In this post, we will discuss about setting up Apache Ignite.

Installation

You can download the Apache Ignite from its official site. You can download the binary, sources, Docker or Cloud images and maven. There is also a third party support from GridGain.

Steps for binary installation

This is pretty straightforward installation. Download the binary from website. You can optionally setup installation path as IGNITE_HOME. To run Ignite as server, you need to run below command on terminal.

/bin/ignite.bat // If it is Windows
/bin/ignite.sh //if it is Linux

The above command will run the Ignite with default configuration file under $IGNITE_HOME/config/default-config.xml, you can pass your own configuration file with following command

/bin/ignite.sh config/ignite-config.xml

Steps for building from sources

If you are likely to build everything from sources, than follow the steps listed below.

# Unpack the source package
$ unzip -q apache-ignite-{version}-src.zip
$ cd apache-ignite-{version}-src
 
# Build In-Memory Data Fabric release (without LGPL dependencies)
$ mvn clean package -DskipTests
 
# Build In-Memory Data Fabric release (with LGPL dependencies)
$ mvn clean package -DskipTests -Prelease,lgpl
 
# Build In-Memory Hadoop Accelerator release
# (optionally specify version of hadoop to use)
$ mvn clean package -DskipTests -Dignite.edition=hadoop [-Dhadoop.version=X.X.X]

Steps for maven

You just need to add the maven dependencies to make it work in your project. Ignite has many integration support with other libraries and almost all of them are optional. The only mandatory one is ignite-core. You can add ignite-spring for configuring Ignite with Spring XML like configurations and ignite-indexing for SQL querying.

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-core</artifactId>
    <version>${ignite.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-spring</artifactId>
    <version>${ignite.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-indexing</artifactId>
    <version>${ignite.version}</version>
</dependency>

You can download the docker image or Cloud AMI from this link.