Monday, November 29, 2010

Quartz Scheduler 2.0 Beta 1 Welcomes New Fluent API and "Where"

Quartz Scheduler, the most widely used Java scheduler is getting some major improvements for 2.0. I'm going to talk a little about what to expect but you can also read the release notes here.

Goals

We had two major goals when planning for Quartz 2.0 began.
  • Simplify/modernize the Quartz API.
  • Improve the Quartz experience when leveraging a cluster
Quart 2.0 API

In order to improve the usability and readability of Quartz 2.0 over passed versions we spent a lot of time evaluating the parts and how a user interacts with them. We found that too much knowledge of those parts needed to be understood at construction time. As a result we moved to a "Fluent API/DSL" approach. The best way to get a feel for the kind of improvement this gives is to compare the samples from 1.8.4 to the same ones translated in 2.0 Beta 1.

In the simple of case of example 1 from the Quartz Kit you can see the basic philosophy change:

//Quartz 1.8.4 Example 2
// First we must get a reference to a scheduler
SchedulerFactory sf = new StdSchedulerFactory();
Scheduler sched = sf.getScheduler();
log.info("------- Initialization Complete -----------");
log.info("------- Scheduling Jobs -------------------");
// computer a time that is on the next round minute
Date runTime = TriggerUtils.getEvenMinuteDate(new Date());
// define the job and tie it to our HelloJob class
JobDetail job = new JobDetail("job1", "group1", HelloJob.class);
// Trigger the job to run on the next round minute
SimpleTrigger trigger =
new SimpleTrigger("trigger1", "group1", runTime);
// Tell quartz to schedule the job using our trigger
sched.scheduleJob(job, trigger);
// Quartz 2.0 Example 2
SchedulerFactory sf = new StdSchedulerFactory();
Scheduler sched = sf.getScheduler();
log.info("------- Initialization Complete -----------");
// computer a time that is on the next round minute
Date runTime = DateBuilder.evenMinuteDate(new Date());
log.info("------- Scheduling Job -------------------");
// define the job and tie it to our HelloJob class
JobDetail job = newJob(HelloJob.class)
.withIdentity("job1", "group1")
.build();
// Trigger the job to run on the next round minute
Trigger trigger = newTrigger()
.withIdentity("trigger1", "group1")
.startAt(runTime)
.build();
// Tell quartz to schedule the job using our trigger
sched.scheduleJob(job, trigger);
log.info(job.getKey() + " will run at: " + runTime);
// Start up the scheduler (nothing can actually run until the
// scheduler has been started)
sched.start();
view raw gistfile1.java hosted with ❤ by GitHub

Improvements include:

  • The date/time related methods have been moved off of the Trigger and Job classes into a Date building class called "DateBuilder"
  • We've removed the need to know details about which Job and Trigger classes you need and instead infer them through the building methods you call.
  • The construction now reads more like a sentence. new job withIdentity "job1", "group1". new trigger withIdentity "trigger1", "group1" start at runTime
This is pretty subtle in a simple case like example 1 but gets more obvious as the cases get more complex. Let's look at example 2.

//Quartz scheduler Example 2 from 1.8.3
// job3 will run 11 times (run once and repeat 10 more times)
// job3 will repeat every 10 seconds (10000 ms)
job = new JobDetail("job3", "group1", SimpleJob.class);
trigger = new SimpleTrigger("trigger3", "group1", "job3", "group1",
new Date(ts), null, 10, 10000L);
ft = sched.scheduleJob(job, trigger);
// the same job (job3) will be scheduled by a another trigger
// this time will only run every 70 seocnds (70000 ms)
trigger = new SimpleTrigger("trigger3", "group2", "job3", "group1",
new Date(ts), null, 2, 70000L);
ft = sched.scheduleJob(job, trigger);
//Quartz scheduler Example 2 from 2.0
// job3 will run 11 times (run once and repeat 10 more times)
// job3 will repeat every 10 seconds
job = newJob(SimpleJob.class)
.withIdentity("job3", "group1")
.build();
trigger = newTrigger()
.withIdentity("trigger3", "group1")
.startAt(startTime)
.withSchedule(simpleSchedule()
.withIntervalInSeconds(10)
.withRepeatCount(10))
.build();
ft = sched.scheduleJob(job, trigger);
// the same job (job3) will be scheduled by a another trigger
// this time will only repeat twice at a 70 second interval
trigger = newTrigger()
.withIdentity("trigger3", "group2")
.startAt(startTime)
.withSchedule(simpleSchedule()
.withIntervalInSeconds(10)
.withRepeatCount(2))
.forJob(job)
.build();
ft = sched.scheduleJob(trigger);
view raw gistfile1.java hosted with ❤ by GitHub

In this case notice how the constructors are growing with no real indication to what each parameter means.

Now lets look at an example where you have to choose a specific Trigger type.

//Quartz 1.8.4 Example 3 snippet
// job 7 will run every 30 seconds but only on Weekends (Saturday and
// Sunday)
job = new JobDetail("job7", "group1", SimpleJob.class);
trigger = new CronTrigger("trigger7", "group1", "job7", "group1",
"0,30 * * ? * SAT,SUN");
sched.addJob(job, true);
ft = sched.scheduleJob(trigger);
//Quartz 2.0 Example 3 snippet
// job 7 will run every 30 seconds but only on Weekends (Saturday and Sunday)
job = newJob(SimpleJob.class)
.withIdentity("job7", "group1")
.build();
trigger = newTrigger()
.withIdentity("trigger7", "group1")
.withSchedule(cronSchedule("0,30 * * ? * SAT,SUN"))
.build();
ft = sched.scheduleJob(job, trigger);
view raw gistfile1.java hosted with ❤ by GitHub

This example shows how in Quartz 1.8.4 you would have to know to select a different Trigger type but in 2.0 it's just abstracted away in the scheduling.

It's worth going through the samples yourself and making any suggestions you may have. Still plenty of time to get your API suggestions in via the Quartz Forum!

Quartz "Where" And Other Improvements

Clustered Quartz has gotten two major improvements in 2.0. Improved performance as node count increases and Quartz "Where"

I wrote a blog about this a while back but now you can play with it. Alex Snaps has written an excellent blog with code samples on the topic. Check it out to really dig in.


Ehcache 2.4 Beta 1 Welcomes Search, Local Transactions and more...

Ehcache 2.4 Beta has been released. We've been spending the last few years working hard to make Ehcache the best possible solution for users with caching needs. 2.4 beta 1 is another big step forward. The beta includes some significant new features and improvements that can be read about here. This blog is a few notes on getting started with those features and a few comments on why they are important.

Ehcache Search

This is one of the most requested features for Ehcache. I can't count how many times that I have had a map or cache and just wanted to quickly and easily search for an entry. Sure, one can write an iterator and use Java's matchers but that's
  • A bit of annoying coding
  • Only practical for unclustered caches
Ehcache 2.4 Beta 1 has a new search interface in core Ehcache that works throughout our scale continuum. We used the "Fluent Interface/DSL" style that we've been moving towards over the last couple of years in both Ehcache and Quartz Scheduler. This style of interface makes an API easy to use and read like english.

To learn more about it check out this self contained sample app and the documenation. Give us feedback early and often on the Ehcache Forums as the API is still under development and can still be improved.

Local Transactions

Back in Ehcache 2.0 we added JTA support. This was helpful but we found that for some use-cases people wanted a few things.
  • Transactions without a JTA transaction manager
  • More speed
We improved the speed across the board of our XA compliant JTA stuff and we also added the concept of local transactions. Local transactions provide optimistic concurrency and atomicity across caches. They are like XA transactions but aren't completely tolerant of external resource crashes outside of Ehcache (i.e. cache + databases don't have all the guarantees of XA in the node failure cases. Does provide the guarantees across Ehcaches), don't require a transaction coordinator (though can be used with them) and tend to be less resource intensive (read faster).


Improved Enterprise Disk Store

In 2.3 we added an Enterprise version of unclustered Ehcache core. The version added support for very large caches through the seamless BigMemory add-on (1 line of config). We also, somewhat more quietly, added the beginnings of a new Enterprise disk store that no longer stored keys in memory and can handle larger restartable caches.

2.4 beta 1 takes this to another level. The disk store is now way more efficient, better leverages SSD drives and can efficiently grow to very large sizes without any additional heap usage or fragmenting. One can exercise the improvements using the Ehcache Pounder.

Other Improvements

While not everything for the final release was done in time for Beta 1 some significant improvements are in.
  • NonStopCache now built in. Rather than have to add a jar and configure a wrapper to get the non-stop characteristics in clustered land this is now built into the product core and be turned on via configuration
  • Search now works clustered - The new search API is backed by the Terracotta tier. This is still early and we have a lot of performance and HA work to do here. That said, it is testable and usable so give it a try.
  • Explicit locking module is now in the core kit
  • Rejoin now works in non-stop (You can disconnect from a cluster and reconnect to that cluster without restarting)

For those interested in scheduling we have also released Quartz Scheduler 2.0 beta 1. More stuff will be coming in a month or so when we do beta 2 for all products.

Monday, November 15, 2010

Direct Buffer Access Is Slow, Really?

Lots of claims in the blog sphere around direct memory buffers being slow. We have been working with them a lot and hadn't seen that slowness but I'm more of a try it and see kinda guy so I did just that.

I cooked up a quick single threaded test to compare offheap direct memory buffers to on heap byte buffers. What I found is that at least on my notebook direct is between 2-5 percent of onheap. Even on my notebook I was writing and reading 2G/second in small chunks to random parts of the byte buffer (Every operation on the test does one write and one read).

The nice thing about them is they occupy no heap so the data stored in them is hidden away from the JVM GC. Of course whether that 2 to 5 percent matters depends on how much data your app is trying to crank through them so as always you need to look at your use-case's, latency, through-put and SLA goals and code accordingly.

While more testing can and should(and has) be done, here is a place for people to start.

Here's the code and my results from my 1.6 ghz notebook. I only spent a few minutes on it so suggestions to improve it are welcome:

Type: ONHEAP Took: 8978 to write and read: 10737418368

Type: DIRECT Took: 9223 to write and read: 10737418368

Type: ONHEAP Took: 8827 to write and read: 10737418368

Type: DIRECT Took: 9283 to write and read: 10737418368

Type: ONHEAP Took: 8813 to write and read: 10737418368

Type: DIRECT Took: 9604 to write and read: 10737418368


import java.nio.ByteBuffer;
import java.util.Random;
public class BufferPerformanceComparison {
public final static int MEGA_BYTE = 1024 * 1024;
public final static long GIGA_BYTE = MEGA_BYTE * 1024;
private final ByteBuffer heapByteBuffer;
private final ByteBuffer offHeapByteBuffer;
private final long dataToLoad;
private final int bufferSize;
private final Random random = new Random();
private final byte[] base = "I've heard direct buffers are really slow. Are they right or are they wrong. Because if they are right I want to be careful using them and if they are wrong I want to leverage them when I can."
.getBytes();
public BufferPerformanceComparison(int bufferSize, long dataToLoad) {
this.heapByteBuffer = ByteBuffer.allocate(bufferSize);
this.offHeapByteBuffer = ByteBuffer.allocateDirect(bufferSize);
this.dataToLoad = dataToLoad;
this.bufferSize = bufferSize;
}
public void start() {
for (int round = 0; round < 4; round++) {
boolean isWarmup = round == 0;
loadByteBuffer(heapByteBuffer, dataToLoad, !isWarmup, "ONHEAP");
loadByteBuffer(offHeapByteBuffer, dataToLoad, !isWarmup, "DIRECT");
}
}
private void loadByteBuffer(ByteBuffer byteBuffer, long dataToLoad,
boolean printResults, String type) {
long bytesWritten = 0;
long start = System.currentTimeMillis();
int writePosition = random.nextInt(bufferSize - (base.length + 1));
int readPosition = random.nextInt(bufferSize - (base.length + 1));
byte[] readArray = new byte[base.length];
int count = 0;
while (bytesWritten < dataToLoad) {
count++;
if (count % 10 == 0) {
writePosition = random.nextInt(bufferSize - (base.length + 1));
}
bytesWritten += base.length;
byteBuffer.position(writePosition);
byteBuffer.put(base);
byteBuffer.position(readPosition);
byteBuffer.get(readArray, 0, base.length);
}
if (printResults)
System.out.println("Type: " + type + " Took: "
+ (System.currentTimeMillis() - start) + " to write and read: "
+ bytesWritten);
}
public static void main(String[] args) throws Exception {
new BufferPerformanceComparison(50 * MEGA_BYTE, 10L * GIGA_BYTE)
.start();
}
}
/// Output for 1.6 ghz notebook ///
Type: ONHEAP Took: 8978 to write and read: 10737418368
Type: DIRECT Took: 9223 to write and read: 10737418368
Type: ONHEAP Took: 8827 to write and read: 10737418368
Type: DIRECT Took: 9283 to write and read: 10737418368
Type: ONHEAP Took: 8813 to write and read: 10737418368
Type: DIRECT Took: 9604 to write and read: 10737418368
view raw gistfile1.java hosted with ❤ by GitHub

Friday, November 05, 2010

A Couple Minutes With Ehcache Search...

Welcome To Ehcache Search

Now that we are releasing our, GC Busting, Ehache BigMemory product we are starting the process of gathering feedback on the new Ehcache Search API. While Ehcache Big Memory solves your Latency, through-put and SLA issues, Ehcache Search gives you the ability to handle a users data retrieval needs in a simple and efficient way.

What is it?

With todays data explosion and hardware memory size explosion, in-memory and caching have become an important part of everyone's software. Now, getting at the data in that cache in a flexible, simple and fast way has really become an important area of focus. Things like finding entries that match criteria, performing aggregations and calculations (eg average, sum, max, min) need to be easy and efficient.

That's where the new Ehcache Search API comes in. It will provide fast efficient searches and aggregations across the Ehcache Snap in, Speed up, Scale out continuum (Single node performance through scaled out architecture).

Please try out the API (It's checked into Ehache core's trunk) by grabbing the this self contained, buildable sample and open source Ehcache nightly jar I've put up on GitHub.


It's simple to use and 100 percent open source. We are very excited about feedback and plenty of time still exists for making changes and improvements. Please ask lots of questions and give lots of feedback.

To try it you can just unpack the kit:

tar -xzvf ehcache-search-sample-0.0.1-SNAPSHOT-distribution.tar.gz

and run it:

sh run.sh

The source code is included in the kit and can be edited in this directory:

src/main/java/org/sharrissf/sample/

rebuilt if you have Maven.

mvn clean install

And rerun using the run.sh command.

You can even package it up yourself by using

mvn clean assembly:assembly

Looking forward to hearing from the community. It will be interesting to see how projects like Grails, Hibernate, Liferay and Cold Fusion leverage cache search. Those uses will help set direction for years to come.