What are Software design principles?

  • Software design principles represent a set of guidelines that helps us to avoid having a bad design.

  • Associated to Robert Martin who gathered them in “Agile Software Development: Principles, Patterns, and Practices".
  • According to Robert Martin there are 3 important characteristics of a bad design that should be avoided:
    • Rigidity: It is hard to change because every change affects too many other parts of the system.
    • Fragility: When you make a change, unexpected parts of the system break.
    • Immobility: It is hard to reuse in another application because it can't be disentangled from the current application.

What does SOLID stands for?

This article is about Indexing and Searching documents with Apache Lucene version 4.7. Before jumping to example and explanation, let's see what Apache Lucene is.

Introduction to Apache Lucene

Lucene is a high-performance, scalable information retrieval (IR) library. IR refers to the process of searching for documents, information within documents, or metadata about documents. Lucene lets you add searching capabilities to your application. [ref. Apache Lucene in Action Second edition covers Apache Lucene v3.0]

The main reason for popularity of Lucene is its simplicity. You don't require in-depth knowledge of indexing and searching process to get started with Lucene. You can start with learning handful of classes which actually do the indexing and searching for Lucene. The latest version released is 4.7 and books are only available for v3.0.

Important note

Lucene is not ready-to-use application like file-search program, web-crawler or search engine. It is a software toolkit or library and with the help of it you can build your own search application or libraries. There are many frameworks build on top of Lucene Core API for searching.

Libraries and Environment used
  • Eclipse Kepler
  • JDK 1.7
  • lucene-core-4.7.2.jar
  • lucene-queryparser-4.7.2.jar
  • lucene-demo-4.7.2.jar
  • lucene-analyzers-common-4.7.2.jar

Indexing with Lucene

Let's jump to indexing process in Lucene with example and then we will explain the classes that are used and their purpose.

1. IndexerTest is class used to show the demo.

package lucene.indexer;

import java.io.File;
import java.io.FileFilter;

/**
 * @author Gaurav Rai Mazra
 */
public class IndexerTest {
 
 public static void main(String[] args) throws Exception {
  String indexDir = "index";
  String dataDir = "dir";
  
  long start = System.currentTimeMillis();
  final IndexingHelper indexHelper = new IndexingHelper(indexDir);
  int numIndexed;
 
  try {
   numIndexed = indexHelper.index(dataDir, new TextFilesFilter());
  }
  finally {
   indexHelper.close();
  }
  
  long end = System.currentTimeMillis();
  System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds");
 }
}

// class filters only .txt files for indexing
class TextFilesFilter implements FileFilter {
 @Override
 public boolean accept(File pathname) {
  return pathname.getName().toLowerCase().endsWith(".txt");
 }
}

2. IndexingHelper class is used to represent how to do the indexing.

package lucene.indexer;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 * @author Gaurav Rai Mazra
 */
public class IndexingHelper {
 //class which actually creates and maintain the indexes in the file
 private IndexWriter indexWriter;
 
 public IndexingHelper(String indexDir) throws Exception {
  //To represent actual directory
  Directory directory = FSDirectory.open(new File(indexDir));
  //Holds configuration required in creation of IndexWriter object
  IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_47, new StandardAnalyzer(Version.LUCENE_47));
  indexWriter = new IndexWriter(directory, indexWriterConfig);
 }
 
 public void close() throws IOException {
  indexWriter.close();
 }
 
 // exposed method to index files 
 public int index(String dataDir, FileFilter fileFilter) throws Exception {
  File[] files = new File(dataDir).listFiles();
  for (File f : files)
  {
   if (!f.isDirectory() && !f.isHidden() && f.exists() && f.canRead() && (fileFilter == null || fileFilter.accept(f)))
    indexFile(f);
  } 
  
  return indexWriter.numDocs();
 }
 
 private void indexFile(File f) throws Exception {
  System.out.println("  " + f.getCanonicalPath());
  Document doc = getDocument(f);
  indexWriter.addDocument(doc);
 }

 private Document getDocument(File f) throws Exception {
  // class used by lucene indexwriter and indexreader to store and reterive indexed data
  Document document = new Document();
  document.add(new TextField("contents", new FileReader(f)));
  document.add(new StringField("filename", f.getName(), Field.Store.YES));
  document.add(new StringField("fullpath", f.getCanonicalPath(), Field.Store.YES));
  return document;
 }
}

In IndexingHelper class, we have used following classes of Lucene library for indexing .txt files.

  • IndexWriter class.
  • IndexWriterConfig class.
  • Directory class.
  • FSDirectory class.
  • Document class.

Explanation

1. IndexWriter: It is the centeral component of indexing process. This class actually creates new Index or opens the existing one and add, remove and update the document in the index. It has one public constructor which takes Directory class's object and IndexWriterConfig class's object as parameters.

This class exposes many methods to add Document class object to be used internally in Indexing.

This class exposes methods used for deletingDocuments from the index as well and other informative methods like numDocs() which returns all the documents in the index including deleted once if they are not flushed on file.

2. IndexWriterConfig: It holds the configuration required to create IndexWriter object. It has one public constructor which takes two parameter one is enum of Version i.e. lucene version for compatibility issues. The other parameter is object of Analyzer class which itself is abstract class but have many implementing classes like WhiteSpaceAnalyzer, StandardAnalyzer etc. which helps in Analyzing the tokens. It is used in analysis process.

3. Directory: The Directory class represents the location of Lucene index. It is an abstract class and have many different concrete implementation. No one implementation is best suited for the computer architecture you have. Hence use FSDirectory abstract class to get best possible concrete implementation available for the Directory class.

4. Analyzer: Before any text is indexed, it is passed to Analyzer for extracting tokens out of that text that should be indexed and rest will be eliminated.

5. Document: Document class represents collection of Fields. It is a chunk of data which we want to index and make it retrievable at a later time.

6. Field: Each document will have one or more than one fields. Each field has a name and corresponding to it a value. Most of Field class methods are depreciated. It is favourable to use other existing implementation of Field class like IntField, LongField, FloatField, DoubleField, BinaryDocValuesField, NumericDocValuesField, SortedDocValuesField, StringField, TextField, StoredField.

Searching with Lucene

Let's jump to searching with Lucene and then will explain the classes used.

package lucene.searcher;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 * @author Gaurav Rai Mazra
 */
public class SearcherTest {

 public static void main(String[] args) throws IOException, ParseException {
  String indexDir = "index";
  String q = "direwolf";
  
  search(indexDir, q);
 }
 
 //Search in lucene index
 private static void search(String indexDir, String q) throws IOException, ParseException {
  //get a directory to search from
  Directory directory = FSDirectory.open(new File(indexDir));
  // get reader to read directory
  IndexReader indexReader = DirectoryReader.open(directory);
  //create indexSearcher
  IndexSearcher is = new IndexSearcher(indexReader);
  // Create analyzer to analyse documents
  Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47); 
  //create query parser
  QueryParser queryParser = new QueryParser(Version.LUCENE_47, "contents", analyzer);
  //get query
  Query query = queryParser.parse(q);
  
  //Query query1 = new TermQuery(new Term("contents", q));

  long start = System.currentTimeMillis();
  //hit query
  TopDocs hits = is.search(query, 10);
  long end = System.currentTimeMillis();
  
  System.err.println("Found " + hits.totalHits + " document(s) in " + (end-start) + " milliseconds");
  for (ScoreDoc scoreDoc : hits.scoreDocs)
  {
   Document document = is.doc(scoreDoc.doc);
   System.out.println(document.get("fullpath"));
  }
 }
}
Explanation

1. IndexReader: This is an abstract class providing an interface for assessing an index. For getting particular implementation helper class DirectoryReader is used which calls open method with passing directory reference to get IndexReader object.

2. IndexSearcher: IndexSearcher is used to search data which is indexed by IndexWriter. You can think of IndexSearcher as a class which opens the index in read-only mode. It requires the IndexReader instance to create object of it. It has method to search and getting documents.

3. QueryParser: This class is used to parse the string to generate query out of it.

4. Query: It is abstract class represent the query to be used in searching. There are many concrete classes to it like TermQuery, BooleanQuery, PhraseQuery etc. It contains several utility method, one of it is setBoost(float).

5. TopDocs: It represents the hit returned by search method of IndexSearcher. It has one public constructor which take three parameters int totalHits, ScoreDoc[] scoreDocs, float maxScore. The ScoreDoc contains the score and documentId of the document.

Introduction

Abstract Factory design pattern comes under creational design pattern. It is also called factory of factories. It provides a way to encapsulate a group of factories which have common behaviour or theme without specifying which underlying concrete objects are created. It is one level of abstraction higher than Factory pattern. In Factory Design Pattern, we get instance of one of several sub-classes whereas in Abstract Factory Design pattern we get instance of one of several factories which then get instance of one of several sub-classes.

Implementation

We are going to create Factories for Import and Export. We have one producer class ImportExportFactoryProducer which produces the factory based on type i,e. if instance of ImportFactory is required or ExportStrategy is required which then initialize the concrete classes for ImportStrategy or ExportStrategy.

1. AbstractFactoryTest class

package com.gauravbytes.test;

import com.gauravbytes._import.ImportStrategy;import com.gauravbytes.export.ExportStrategy;import com.gauravbytes.importexport.factory.AbstractImportExportFactory;import com.gauravbytes.importexport.factory.ImportExportFactoryProducer;import com.gauravbytes.importexport.factory.ImportExportFactoryTypes;

/**
 * @author Gaurav Rai Mazra
 */
public class AbstractFactoryTest {

 public static void main(String[] args) {
  int factoryType = ImportExportFactoryTypes.TYPE_EXPORT;
  int strategyType = AbstractImportExportFactory.TYPE_EXCEL;
  
  AbstractImportExportFactory ioFactory = ImportExportFactoryProducer.getInstance().getFactory(factoryType);
  
  switch (factoryType) {
   case ImportExportFactoryTypes.TYPE_EXPORT :
    ExportStrategy exportStrategy = ioFactory.getExportStrategy(strategyType);
    exportStrategy.export();
    break;
    
   case ImportExportFactoryTypes.TYPE_IMPORT : 
    ImportStrategy importStrategy = ioFactory.getImportStrategy(strategyType);
    importStrategy.importFile();
    break;
   default:
    break;
  }
 }
}

2. FactoryProducer class, In our case ImportExportFactoryProducer

package com.gauravbytes.importexport.factory;

import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

/**
 * This is Factory producer class
 * @author Gaurav Rai Mazra
 *
 */
public class ImportExportFactoryProducer {
 private static ImportExportFactoryProducer factoryInstance = null;
 private static final Lock lock = new ReentrantLock();
 private ImportExportFactoryProducer() {
  
 }
 
 public static ImportExportFactoryProducer getInstance() {
  try {
   lock.lock();
   if (factoryInstance == null)
    factoryInstance = new ImportExportFactoryProducer();
  }
  finally {
   lock.unlock();
  }
  return factoryInstance;
 }
 
 
 public AbstractImportExportFactory getFactory(int factoryType) {
  AbstractImportExportFactory factory = null;
  switch (factoryType) {
   case ImportExportFactoryTypes.TYPE_IMPORT:
    factory = ImportFactory.getInstance(lock);
    break;
    
   case ImportExportFactoryTypes.TYPE_EXPORT:
    factory = ExportFactory.getInstance(lock);
    break;
 
   default:
    break;
  }
  return factory;
 }
}

3. Factory of Factory  and the concrete factories . In our case AbstractImportExportFactory, ImportFactory and ExportFactory.

package com.gauravbytes.importexport.factory;

import com.gauravbytes._import.ImportStrategy;import com.gauravbytes.export.ExportStrategy;

/**
 * This is factory of factory gives the instance for particular strategy
 * @author Gaurav Rai Mazra
 *
 */
public abstract class AbstractImportExportFactory {
 public static final int TYPE_EXCEL = 1;
 public static final int TYPE_PDF = 2;
 public static final int TYPE_PLAIN = 3;
 
 
 public abstract ImportStrategy getImportStrategy(int strategyType);
 public abstract ExportStrategy getExportStrategy(int strategyType);
}
package com.gauravbytes.importexport.factory;

/**
 * @author Gaurav Rai Mazra
 */
public abstract class ImportExportFactoryTypes {
 public static final int TYPE_IMPORT = 1;
 public static final int TYPE_EXPORT = 2;
}
package com.gauravbytes.importexport.factory;

import com.gauravbytes._import.ExcelImport;import com.gauravbytes._import.ImportStrategy;import com.gauravbytes._import.PdfImport;import com.gauravbytes._import.PlainTextImport;import com.gauravbytes.export.ExportStrategy;
import java.util.concurrent.locks.Lock;

/**
 * Import factory
 * @author Gaurav Rai Mazra
 */
public class ImportFactory extends AbstractImportExportFactory {

 private static ImportFactory importFactory = null;
 private ImportFactory() {
  
 }
 
 public static ImportFactory getInstance(Lock lock) {
  try {
   lock.lock();
   if (importFactory == null)
    importFactory = new ImportFactory();
  }
  finally {
   lock.unlock();
  }
  return importFactory;
 }
 
 @Override
 public ImportStrategy getImportStrategy(int strategyType) {
  ImportStrategy importStrategy = null;
  switch (strategyType) {
   case TYPE_EXCEL:
    importStrategy = new ExcelImport();
    break;
   
   case TYPE_PDF:
    importStrategy = new PdfImport();
    break;
    
   case TYPE_PLAIN:
    importStrategy = new PlainTextImport(); 
    break;
 
   default:
    break;
  }
  return importStrategy;
 }

 @Override
 public ExportStrategy getExportStrategy(int strategyType) {
  return null;
 }

}
package com.gauravbytes.importexport.factory;

import com.gauravbytes._import.ImportStrategy;import com.gauravbytes.export.ExcelExport;import com.gauravbytes.export.ExportStrategy;import com.gauravbytes.export.PdfExport;import com.gauravbytes.export.PlainTextExport;
import java.util.concurrent.locks.Lock;


/**
 * Factory to get proper strategy object based on its type
 * @author Gaurav Rai Mazra
 */
public class ExportFactory extends AbstractImportExportFactory {

 private static ExportFactory exportFactory = null;
 
 private ExportFactory() {
  
 }
 
 public static ExportFactory getInstance(Lock lock) {
  try {
   lock.lock();
   if (exportFactory == null)
    exportFactory = new ExportFactory();
  }
  finally {
   lock.unlock();
  }
  return exportFactory;
 }
 
 @Override
 public ImportStrategy getImportStrategy(int strategyType) {
  return null;
 }

 @Override
 public ExportStrategy getExportStrategy(int strategyType) {
  ExportStrategy strategy = null;
  switch (strategyType) {
   case TYPE_EXCEL:
    strategy = new ExcelExport();
    break;

   case TYPE_PDF:
    strategy = new PdfExport();
    break;

   case TYPE_PLAIN:
    strategy = new PlainTextExport();
    break;

   default:
    break;
  }
  return strategy;
 }
}

4. ImportStrategy and their concrete classes

package com.gauravbytes._import;

/**
 * @author Gaurav Rai Mazra
 */
public interface ImportStrategy {
 public void importFile();
}
package com.gauravbytes._import;

/**
 * @author Gaurav Rai Mazra
 */
public class ExcelImport implements ImportStrategy {

 @Override
 public void importFile() {
  // Logic to import excel file goes here
 }

}
package com.gauravbytes._import;

/**
 * @author Gaurav Rai Mazra
 */
public class PdfImport implements ImportStrategy {

 @Override
 public void importFile() {
  //Logic to import pdf goes here
 }

}
package com.gauravbytes._import;

/**
 * @author Gaurav Rai Mazra
 */
public class PlainTextImport implements ImportStrategy {
 @Override
 public void importFile() {
  //Logic to import plain text file
 }
}

5. ExportStrategy with their concrete classes

package com.gauravbytes.export;

/**
 * @author Gaurav Rai Mazra
 */
public interface ExportStrategy {
 public void export();
}
package com.gauravbytes.export;

/**
 * @author Gaurav Rai Mazra
 */
public class ExcelExport implements ExportStrategy {

 @Override
 public void export() {
  // Excel export code goes here
 }

}
package com.gauravbytes.export;

/**
 * @author Gaurav Rai Mazra
 */
public class PlainTextExport implements ExportStrategy {
 @Override
 public void export() {
  // Plain text export goes here
 }
}
package com.gauravbytes.export;

/**
 * @author Gaurav Rai Mazra
 */
public class PdfExport implements ExportStrategy {

 @Override
 public void export() {
  //Pdf export Goes here
 }

}

What is Serialization?

Serialization is the process of encoding an object to byte stream and reverse of it is called deserialization.It is platform independent process which means you can serialize the object in one JVM and can transport the object over network and / or store it in filesystem and then deserialize it in other or on same JVM.

Class needs to implements marker interface Serializable in order to make their object to be eligible for Serialization.

fields with transient and/or static modifiers are not serialized by regular serialization process.

ObjectInputStream and ObjectOutputStream are high level stream classes which have methods to serialize and de-serialize Java objects.

ObjectOutputStream class has many write methods but the method that is usually used method is:

public final void writeObject(Object obj) throws IOException

ObjectInputStream class has many read methods but the method that is usually used method is:

public final Object readObject() throws IOException, ClassNotFoundException

Basic Serialization Example

Let's first define Dog pojo and then use ObjectInputStream and ObjectOutputStream for de-serialization and serialization.

public class Dog implements Serializable {
  private static final long serialVersionUID = 8661314562327474362L;

  private int height;
  private String name;

  // getters/ setters/ toString/ constructors
}

public class SimpleSerializationExample {
  public static void main(String[] args) {
    Dog dog = new Dog(50, "Titan"); // create dog object with height 50 and name Titan 
    System.out.println("Before Serialization");
    System.out.println(dog);
    try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("dog.ser"))) {
      oos.writeObject(dog);// serialize the dog object
    }
    catch (IOException ioEx) { /* Don't Swallow exception in real projects */ }
   
    dog = null; // let clear old dog object reference
    try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("dog.ser"))) {
      dog = (Dog) ois.readObject();// deserialize dog object
    }
    catch (IOException | ClassCastException | ClassNotFoundException ex) { /* Don't Swallow exception in real projects */ }

    System.out.println("After Serialization");
    System.out.println(dog);
  }
}
Point to remember, if the class is not implementing Serializable interface then writeObject() method of ObjectOutputStream will throw run-time exception java.io.NotSerializableException

The above class was pretty straightforward with two primitive instance members. But, in real world could there be a case when

case 1: super class is not serializable but sub-class is.

case 2: You class have composed object of another class which doesn't implements Serializable interface.

case 1: super class is not serializable but sub-class is

There could be situation when you are extending some class which is not serializable but you want the sub-class be serializable. It is only possible if your super-class has

  • default no-args constructor.
  • supports ways to initialize safely the shared fields.
If the super-class doesn't implements Serializable interface then the inherited fields in the sub-class will be initialized to their default values and the Constructor will be called from first instance of non-serializable class till all the super class constructors are run. But in case of serializable class, then constructor will never run at the time of deserialization.

case 2: You have composed of object of another class which doesn't support serialization

The only solution is to mark that field reference as transient and do custom serialization for that.

writeObject and readObject to rescue for case 1 and case 2

Java serialization have special mechanism just for this. It has a set of private methods you can implement in your serializable class which will be invoked automatically during serilization and deserilization. The signature of these methods are :

private void writeObject(ObjectOutputStream os) {
 //Your custom code for serilization goes here
}

private void readObject(ObjectInputStream is) {
 //Your custom code for deserilization goes here
}

These methods let you step into the middle of serialization and deserialization.

Let's move to example.

public class Collar {
  private int size;
  // getters/ setters/ toString
}

public class Dog implements Serializable {

  private static final long serialVersionUID = 6870143058315212650L;

  private int height;
  private String name;
  private transient Collar myCollar;
  
  //getters/ setters/ toString
  private void writeObject(ObjectOutputStream os) throws IOException {
    try {
      os.defaultWriteObject();// lets first have default serialization happens
      os.writeInt(myCollar.getSize()); // save custom values
    } catch (IOException ex) { }
  }

  private void readObject(ObjectInputStream is) throws IOException, ClassNotFoundException {
    try {
      is.defaultReadObject();// lets first have default deserialization happens
      myCollar = new Collar(is.readInt()); // custom read values
    } catch (IOException | ClassNotFoundException ex) {
    }
  }
}

public class SimpleSerializationExample {
  public static void main(String[] args) {
    Dog dog = new Dog(1, "BuBu", new Collar(5)); // create dog object with height 50 and name Titan 
    System.out.println("Before Serialization");
    System.out.println(dog);
    try (ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("dog_collar.ser"))) {
      oos.writeObject(dog);// serialize the dog object
    }
    catch (IOException ioEx) {
      /* Don't Swallow exception in real projects */
    }

    dog = null; // let clear old dog object reference
    try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("dog_collar.ser"))) {
      dog = (Dog) ois.readObject();// deserialize dog object
    }
    catch (IOException | ClassCastException | ClassNotFoundException e) {
      /* Don't Swallow exception in real projects */
    }
   
    System.out.println("After Serialization");
    System.out.println(dog.toString());
  }
}
Explanation
  • Line 15 and 22 declares writeObject and readObject methods in the Dog class.
  • Line 17 invoked defaultWriteObject() of ObjectOutputStream from writeObject(). We are telling JVM to perform the normal serialization. Line 18 did custom writing to the serialized object.
  • Line 24 invoked defaultReadObject() of ObjectInputStream from readObject() to handle normal serialization and then our custom deserialization starts at Line 25.
If we are customizing the serialization and deserialization using readObject() and writeObject() methods, then we need to read (readObject()) in order of writing(writeObject())  in the ObjectOutputStream because it saves in the sequence we write it. If we don't do so, we may end up in half-baked or wrongly deserialized object.