thejavajar

{ java, groovy, flex, python, ruby }

Flower

Posts Tagged ‘Hazelcast’

Intro to Hazelcast’s Distributed Query

When you decide to incorporate a distributed data grid as part of your application architecture, a product’s scalability, reliability, cost and performance are key considerations that will help you make your decision. Another key consideration will be the accessibility of the data. One nice feature of Hazelcast that I have been working with lately is distributed queries. In simple terms, distributed queries provide an API and syntax that allow a developer to query for entries that exist in a Hazelcast distributed map. Let’s look at a very simple example.

In the demo project (link at the bottom) I have one object, a test case and the Hazelcast 1.8.4 jar file as a project dependency. Below is the class that will be put into a distributed map, ReportData. Once we have a distributed map that is full of ReportData entries, we can use Hazelcast’s distributed query to find our ReportData entries.

package org.axiomaticit.model;

public class ReportData implements Serializable {

	private static final long serialVersionUID = 2789198967473633902L;
	private Long id;
	private Boolean active;
	private String reportName;
	private String value;
	private Date startDate;
	private Date endDate;

	public ReportData(Long id, Boolean active, String reportName, String value, Date startDate, Date endDate) {
		this.id = id;
		this.active = active;
		this.reportName = reportName;
		this.value = value;
		this.startDate = startDate;
		this.endDate = endDate;
	}

	// all the getters and setters
}

Nothing too complex in the code above. It is just an object that implements Serializable and that contains a few different types (String, Boolean and Date) of attributes. This class will work nicely to help demonstrate Hazelcast’s distributed query API and syntax. I omitted the getters and setters for brevity.

// get a "ReportData" distributed map
Map<Long, ReportData> reportDataMap = Hazelcast.getMap("ReportData");

// create a ReportData object
ReportData reportData = new ReportData(...);

// put it into our Hazelcast Distributed Map
reportDataMap.put(reportData.getId(), reportData);

In the test code, I created ~50,000 ReportData objects using a for loop and put them into the “ReportData” distributed map. I used the index, 0..50,000, for the ReportData’s id and the reportName is set to “Report ” + index. I did a few other things, so we could have a few different dates represented in our map’s entries. Check out the demo project for more detail.

Set<ReportData> reportDataSet = (Set<ReportData>) map.values(new SqlPredicate("active AND id > 990 AND reportName = 'Report 995'"));

The above code queries the distributed map for all ReportData objects where active is equal to true, the id is greater than 990 and the reportName is equal to “Report 995″.

Below the reportDataSet will contain all ReportData where active is equal to true and id is greater than 49985.

Set<ReportData> reportDataSet = (Set<ReportData>) map.values(new SqlPredicate("active AND id > 49985"));

Below, we have a case where we are building the predicate programmatically using the EntryObject to fetch all ReportData where the id is greater than 49900 and the endDate attribute of ReportData is between two dates, startDate and endDate. I included the code below to show how I am creating a few dates to use in the predicate that eventually gets passed into the map.values(predicate) method.

		Calendar calendar1 = Calendar.getInstance();
		calendar1.set(2010, 3, 1);
		Calendar calendar2 = Calendar.getInstance();
		calendar2.set(2010, 3, 31);

		Date startDate = new Date(calendar1.getTimeInMillis());
		Date endDate = new Date(calendar2.getTimeInMillis());

		EntryObject e = new PredicateBuilder().getEntryObject();
		Predicate predicate = e.get("id").greaterThan(new Long(49900)).and(e.get("endDate").between(startDate, endDate));

		Set<ReportData> reportDataSet = (Set<ReportData>) map.values(predicate);

Getting data from your Hazelcast distributed map using the distributed query API and query syntax is pretty straight forward. Most of these queries ran for about 500 milliseconds to 2 seconds in my IDE. The power and performance comes from the ability to query objects or map entries that are in memory rather than always relying on a round trip to your RDBMS. Distributed queries are an important feature that make Hazelcast a great tool that can help offset the workload of your RDBMS. With Hazelcast and a good knowledge of your enterprise data, you can implement a simple and effective solution that will easily scale to as many Hazelcast nodes your hardware can support. The demo project can be downloaded here. For more information, check out Hazelcast’s website or visit the project’s home at Google Code.

Spring 3, AspectJ and Hazelcast – Cache Advance

I have been working with Java and related technologies at multiple companies since 2004. Most of the major business problems that I have encountered revolve around working with relatively small data objects and relatively small data stores (less than 50GB). One commonality in the development environment at each of these companies, other than Java, has been some form of legacy data store. In most cases, the legacy data store was not originally designed to support all of the various applications that are now dependent on the legacy system. In some cases, performance issues would arise that were most likely due to over utilization.

One approach to help alleviate utilization issues on legacy resources is with data caching. With data caching we can utilize available memory to keep our data objects closer to our running application. We can take advantage of technologies like Hazelcast, a data distribution platform for Java, to provide support for distributed data caching. In particular, this example focuses on Hazelcast’s distributed Map to manage our in-memory caching. Because Hazelcast is easily integrated with most Web applications, include the hazelcast jar and xml file, the overhead is minimal. When we take advantage of Aspect Oriented Programming(AOP), with the help of Spring and AspectJ, we can leave our current implemented code in place and implement our distributed caching strategy with minimal code changes.

Let’s look at a simple example where we are loading and saving objects in a simple Data Access Object (DAO). Below, PersistentObject, is the persistent data object we are going to use in this example. Note that this object implements Serializable because it is required if we want to put this object into Hazelcast’s distributed Map (it is also a good idea for applications that utilize session replication).

public class PersistentObject implements Serializable {

	private static final long serialVersionUID = 7317128953496320993L;

	private Long id;
	private String field1;
	private String field2;
	private String field3;
	private List<String> fieldList1;

	public Long getId() {
		return id;
	}
	public void setId(Long id) {
		this.id = id;
	}
}

Here is our simple interface for the DAO. Yes, this interface is ridiculously simple, but it does what we need it to do for this example.

public interface DataAccessObject {

	public PersistentObject fetch(Long id);

	public PersistentObject save(PersistentObject persistentObject);
}

Here is the implementation of the DataAccesObject interface. Again, really simple and in fact, I left out the meat of the implementation for brevity, but it will work for this example. Each of these methods would usually have some JDBC code or ORM related code. The key here is that our DAO code will not change when we re-factor the code to utilize the distributed data cache because it will be implemented with AspectJ and Spring.

public class DataAccessObjectImpl implements DataAccessObject {

	private static Log log = LogFactory.getLog(DataAccessObjectImpl.class);

	@Override
	public PersistentObject fetch(Long id) {

		log.info("***** Fetch from data store!");

		// do some work to get a PersistentObject from data store

		return new PersistentObject();
	}

	@Override
	public PersistentObject save(PersistentObject persistentObject) {

		log.info("***** Save to the data store!");

		// do some work to save a PersistentObject to the data store

		return persistentObject;
	}
}

The method below, “getFromHazelcast”, exists in the DataAccessAspect class. It is an “Around” aspect that gets executed when any method “fetch” is called. The purpose of this aspect and pointcut is to allow us to intercept the call to the “fetch” method in the DAO and possibly reduce “read” calls to our data store. In this method, we can get the Long “id” argument from the called “fetch” method, get our distributed Map from Hazelcast and try to return a PersistentObject from the Hazelcast distributed Map, “persistentObjects”. If the object is not found in the distributed Map, we will let the “fetch” method handle the work as originally designed.

@Around("execution(* fetch(..))")
public Object getFromHazelcast(ProceedingJoinPoint pjp) throws Throwable {

	// get method args
	Object[] args = pjp.getArgs();

	// get the id
	Long id = (Long) args[0];

	// check Hazelcast distributed map
	Map<Long, PersistentObject> persistentObjectMap = Hazelcast.getMap("persistentObjects");
	PersistentObject persistentObject = persistentObjectMap.get(id);

	// if the persistentObject is not null
	if(persistentObject != null) {
		log.info("***** Found it in Hazelcast distributed map!");
		return persistentObject;
	}

	// continue with the fetch method that was originally called if PersistentObject was not found
	return pjp.proceed();
}

The method below, “putIntoHazelcast”, also exists in the DataAccessAspect class. It is an “AfterReturning” aspect that gets executed when any method “save” returns. As each PersistentObject is persisted to the data store in the “save” method, the “putIntoHazelcast” method will insert or update the PersistentObject in the distributed Map “persistentObjects”. This way we have our most recent PersistentObject versions available in the distributed Map. If we just keep inserting/updating all PersistentObject’s into the distributed Map, we would have to eventually look into our distributed Map’s eviction policy to keep more relavent application data in our cache, unless, we have excess or abundant memory.

@AfterReturning(pointcut="execution(* save(..))", returning="retVal")
public void putIntoHazelcast(Object retVal) throws Throwable {

	// get the PersistentObject
	PersistentObject persistentObject = (PersistentObject) retVal;

	// get the Hazelcast distributed map
	Map<Long, PersistentObject> persistentObjectMap = Hazelcast.getMap("persistentObjects");

	// put the PersistentObject into the Hazelcast distributed map
	log.info("***** Put this PersistentObject instance into the Hazelcast distributed map!");
	persistentObjectMap.put(persistentObject.getId(), persistentObject);
}

I have also included a snippet from the Spring application-context.xml file that provides a simple way to get AspectJ working in the Spring container.

     <aop:aspectj-autoproxy proxy-target-class="true"/>

     <bean id="dataAccessAspect" class="org.axiomaticit.aspect.DataAccessAspect"/>

     <bean id="dataAccessObject" class="org.axiomaticit.dao.DataAccessObjectImpl"/>

This is a simple example of how Spring, AspectJ and Hazelcast can work together to help reduce “read” calls to a data store. Imagine reducing one application’s “read” executions against a legacy data store while improving read performance metrics. This example doesn’t really answer all questions and concerns that will arise when implementing and utilizing a Hazelcast distributed data cache with Spring and AspectJ, but I think it shows that these technologies can help lower resource utilization and increase performance. Here is a link to the demo project.

Hazelcast Groovyness

Data distribution is a pretty cool topic. Recently, I have been working with Hazelcast, which is an open source clustering and data distribution platform for Java. Well, I really like what I have seen so far and I figured why not have some fun with Hazelcast and Groovy.

I started by adding the Hazelcast 1.7.1 jar to $GROOVY_HOME/lib. Hazelcast, at an introductory level, provides distributed implementations of java.util { Queue, List, Set, Map }. I can run a Groovy script on multiple JVM’s and I can share a Map of customers on each instance. For example:

def customersMap = Hazelcast.getMap("customers")

Now, I have an instance of Map and I can add values using Hazelcast’s distributed id generator:

def idGen = Hazelcast.getIdGenerator("customer-ids")
def id = idGen.newId()
customersMap.put(id, "Customer $id")

So, that was pretty simple, right? Here is the entire Groovy script HazelcastGroovynessAdd.groovy:

import com.hazelcast.core.Hazelcast
import com.hazelcast.core.IdGenerator

def customersMap = Hazelcast.getMap("customers")
def idGen = Hazelcast.getIdGenerator("customer-ids")
def id = idGen.newId()
customersMap.put(id, "Customer $id")

I can open up a few different command prompts and enter:

> groovy HazelcastGroovynessAdd.groovy

Now, the customers Map has a few customers in it and our Groovy scripts are still running. Let’s add an com.hazelcast.core.EntryListener to the customers Map so we can detect a com.hazelcast.core.EntryEvent. Here is HazelcastGroovyness.groovy:

import com.hazelcast.core.Hazelcast
import com.hazelcast.core.EntryListener
import com.hazelcast.core.EntryEvent

def listener = [
	entryAdded: { EntryEvent ev ->
		println "key $ev.key was added with value $ev.value to $ev.name"
		Hazelcast.getMap("customers").values().each {
			println it
		}
	},
	entryUpdated: { EntryEvent ev -> },
	entryRemoved: { EntryEvent ev -> },
	entryEvicted: { EntryEvent ev -> }
] as EntryListener

def customersMap = Hazelcast.getMap("customers")
customersMap.addEntryListener(listener, true)

In the above code, we define listener which implements com.hazelcast.core.EntryListener. I now start up HazelcastGroovyness.groovy at a new command prompt(s):

> groovy HazelcastGroovyness.groovy

We can go back to our original HazelcastGroovynessAdd.groovy script and open (re-open) a few more command prompts and run the script that adds customers to the Map. Now in each running instance of HazelcastGroovyness.groovy we see something like:

key 2000001 was added with value Customer 2000001 to c:customers
Customer 2000001
Customer 1000001
Customer 1

Hazelcast is very cool, easy to use technology that provides distributed data with a few lines of code, especially with Groovy. More information can be found at Hazelcast’s website and at the project site at Google Code.