Friday, May 07, 2010

A Couple Minutes With Non-Stop Ehcache

Sometimes you just want to have an SLA (Service Level Agreement) when dealing with a component like a cache. You want to know that no matter what goes wrong, disk failures, deadlocks, network down, database locked up (for write-through) your threads won't be blocked for longer than your SLA. In the security world this is analogous to Layered Security where one is protected at multiple layers from breaches. In this case you are protected at multiple layers from hangs.

Non-Stop Ehcache

In comes NonStopCache. This is a decorator for Ehcache that allows one to specify an SLA. When using this decorator no operation is allowed to take longer than the SLA provided. All operations are isolated from the cache via a thread pool providing complete protection. The NonStopCache has multiple ways of configuring it. One can setup whether to serve stale data, perform noops or throw exceptions when SLA's are being violated covering most of the cases one needs.

Getting Started

Here's what I did to get started using clustered Ehcache and NonStopCache:

  • run the below program (source code below)
java -cp .:ehcache-core-2.1.0-beta.jar:slf4j-api-1.5.8.jar:slf4j-jdk14-1.5.8.jar:ehcache-terracotta-2.1.0-beta.jar:ehcache-nonstopcache-1.0.0-beta.jar MyFirstNonStopEhcacheSample

NOTE: Kill the Terracotta server when the printed output instructs you too.

Let's review the output:

Regular cache. No Decorator
The size of the cache is: 0
After put the size is: 1
Here are the keys:
Done with cache.

Sleeping, Stop your server
Disconnected NonStop with noop cache.
The size of the cache is: 0
After put the size is: 0
Here are the keys:
Done with cache.

Disconnected NonStop with local reads cache.
The size of the cache is: 1
After put the size is: 1
Here are the keys:
Done with cache.

Disconnected NonStop with exception cache.
Exception in thread "main" net.sf.ehcache.constructs.nonstop.NonStopCacheException: getKeys timed out
at net.sf.ehcache.constructs.nonstop.behavior.ExceptionOnTimeoutBehavior.getKeys(
at net.sf.ehcache.constructs.nonstop.behavior.ClusterOfflineBehavior.getKeys(
at net.sf.ehcache.constructs.nonstop.NonStopCache.getKeys(
at MyFirstNonStopEhcacheSample.addToCacheAndPrint(
at MyFirstNonStopEhcacheSample.(
at MyFirstNonStopEhcacheSample.main(

What Just Happened?

The cache is first loaded into an ordinary undecorated cache. This is performed before the server kill and proceeds without incident. The next round of operations on the cache were performed with the server down.

Since that cache is configured as noop all operations are ignored and the size of the cache appears as zero. This is great for when you just want to not bother with the caching if it isn't available.

Next round is local reads. This allows you to use the values that are available locally while the cache is unavailable. This is especially nice for read only or mostly read only caches that fit in memory and are always available.

After that we go to the exception version. If you have a cache of important coherent data but you don't want threads blocked this setting is for you. It will blow cache operations out of the thread so your container can just keep going by either showing an error page or getting the data from elsewhere.

Some interesting points here:
  • These decorators are all being used on the same cache. This way you can make the behavior specific to the user of the cache. It gives tremendous flexibility.
  • You'll notice this little sample flies through despite the timeout being set to 13 seconds. This is because it's in fail fast mode. In this configurable mode if the cache knows it can't communicate it will return the failure case immediately. If that's not what you want you can instead set it up to not fail fast and wait the full timeout no matter what.
  • I did this work in config but the same setup can be done in code
The Code

And the Config file ehcachenonstop.xml:


Matthias Matook said...

Hi Steve,

Thanks for the example but the ehcachenonstop.xml isnt correct.

I fixed the upper case issues and closed the open XML tags (XML didn't validate).

Aside from this it works.
Below the fixed version.

Cheers Matthias

--- fixed ehcachenonstop.xml ---

Steve Harris said...


I copy pasted the file from what I was doing but maybe something is messed up? Can you e-mail me your changes so I can compare. My e-mail address is just steve at the company I work at.

Anonymous said...

Thanks for the blog.

From article,
"You'll notice this little sample flies through despite the timeout being set to 13 seconds. This is because it's in fail fast mode. "

How do I set up "fail fast mode"?

Thanks, g

Steve Harris said...

I believe setting "immediateTimeout = true" on the NonStopCache decorator should do the trick

Anonymous said...

Thanks Steve.

Also, do you know why when I call get() below without stopping the server and I only have this one thread calling the get() and no other threads,

BlockingCache blockingCache =
new BlockingCache(exceptionCache);


that it gives this:
net.sf.ehcache.constructs.nonstop.NonStopCacheException: getQuite for key - 'czDVZVm1QjCXsdZSvbYlmA99' timed out
at net.sf.ehcache.constructs.nonstop.behavior.ExceptionOnTimeoutBehavior.getQuiet(
at net.sf.ehcache.constructs.nonstop.behavior.ExecutorBehavior.getQuiet(
at net.sf.ehcache.constructs.nonstop.NonStopCache.getQuiet(
at net.sf.ehcache.constructs.blocking.BlockingCache.get(
at net.sf.ehcache.constructs.blocking.BlockingCache.get(
at com.barra.cp.bol.cache.NonStopCacheTest.testNonStopCache(

I thought I would expect it to return null as value is not in the cache and server was not stopped.

Thanks, -g

Steve Harris said...

I don't think blocking cache really works so well with nonstop cache right now. You can work around this with explicit locking and try finally.