Wednesday, December 30, 2009

From Performance to Scale - A Software Story... Part I

When building most applications whether it's JEE, Spring, Grails or plain old Tomcat their are a series of common tools that are used. HttpsSessions is used for managing the state of the web portion, ORM for talking to a database, a cache to hold interimediate application state and a scheduler to make things happen at a certain time, place or interval. Out of the box it is quite easy to get started with these components. However, as an application makes it's way into production it often needs improved performance, often followed by HA and then scale-out.

This blog is about a journey. It's about taking an application from performance all the way to scale-out without rewrites and using the components that you already have.

In the beginning it's just the Application

The good news is that whether you're a JBoss person, a Spring person, a Grails person or a Tomcat person your really using the same key components. Almost everyone in these environments are using Hibernate as an ORM, the Http Session Spec, Ehcache for caching/performance and Quartz for scheduling. So while their is no official standard, the global consensus is pretty clear. If it's me I would start from Grails (which is built on Spring, Ehcache, Quartz and Hibernate) but doesn't require the 2 weeks of picking, deeply understanding and piecing together components that is required with most other approaches.

Round 1, Performance

So you've written your single node application using the usual components. Turns out, it's slow. Now what? Well the first thing you need to do is understand why it's slow at a high level. I usually check the following obvious things:

1) CPU bound - Check machines CPU stats on all nodes (including DB), now day you also need to figure out if one of your CPU's is maxed out which could mean the app needs more parallelism

2) GC Bound - Use verbose GC or your favorite tool

3) Database bottleneck - A number of tools and tricks exist to see if your database is the bottleneck. My favorite ways are:
- Keep track of and monitor query times and or look at hibernate stats
- Take thread dumps. If all threads are blocked on DB calls something is wrong
- Check resource Utilization on the Database machine

4) I also like to isolate operations and see which ones are slow or create slowness. This can tell you a lot about where to focus.

Ok, so now you know why things are slow and your in the same boat as about 90 percent of the world. The Database is the number one bottleneck.

Tuning the ORM

Many start by generating the ORM schema using tools. This usually ends up being a completely Normalized database that requires expensive queries. At this stage of development it's almost certain that this is where your performance problems will be. Aside from the usual mistakes like missing JDBC connection pool or bad JDBC Connection pool (Use C3PO and make sure your seeing parallelism. Many of the others are either useless or broken). The next thing to do is to try hibernate 2nd level caching. If your application is read heavy this can have an amazing impact on performance. Entity caching is generally the way to start. The most common gotcha is it getting invalidated by custom SQL. It is a little known fact that if you execute custom sql through hibernate it clears the cache making it useless. If you are going to do Query caching you MUST READ THIS BLOG.

Now you've got hibernate 2nd level caching turned on and you've tuned it. The database isn't the bottleneck anymore but your seeing a lot of GC and thread dumps show contention on Hibernate itself. Especially when displaying medium size to large tables of data on the screen.

Plus, it turns out that updating the database on every intermediate operation that occurs is pounding the databases locks, cpu and disk. It's also made my schema really unwieldy What do I do now?

Stepping up to Caching

If after 2nd level caching you still have performance problems it is usually time to look at application level caching. This can be extremely helpful for performance. If your seeing:

a) A lot of garbage creation around calculating results
b) cpu usage or contention getting information
c) Latency from IO in retrieving information from disk, a web service or a DB
d) Pounding your database with fine grained updates/information with information that you only needed for a short time.

You'll likely want to start caching. Some things to cache include:

1) heavily used reference data from the database - States, zipcodes, product ID's etc
2) Intermediate state - If a series of operations occur but all you care about is the end result. Cache until the end result is reached then put in the DB.
3) heavily reused calculated values - "Like totally searching for Britney Spears."
4) Temporary values that are only needed for a set amount of time - user sessions or data that is needed at a certain time of day.

Hopefully between the Hibernate caching and tuning and the Application data caching you've now got much of your performance issues under control.

Your application probably does some scheduling and uses http sessions but these rarely create performance problems in one node.


Part II - Now that it's performaning what about availability?
Part III - One node isn't enough anymore I need scale-out

Wednesday, December 16, 2009

Clustered Quartz Scheduler In 5 Minutes

Need a fast clustered/persistent Quartz Job Scheduler? Quartz Scheduler, the ubiquitous job scheduler built into Spring, JBoss and Grails can be configured to provide those features in under 5 minutes in this brief tutorial.

A Brief Digression Into The Why

Why do I need a persistent scaled out job scheduler? The main use cases for a clustered/persistent job scheduler are:
  • HA - You need to be able to restart your application without losing scheduled jobs
  • Scale Out - Your application now needs more than one node to handle the load it receives and you want your scheduled jobs to distribute across nodes.
  • You are using a database to persist and or scale your scheduler and you are seeing DB load/locking issues and or you find it two difficult to setup or maintain.

Steps:

1) Download Terracotta 3.2 (Which includes Quartz Scheduler) http://www.terracotta.org/dl/oss-download-catalog

2) Put the following jars in your class path (all included in the quartz-1.7.0 directory of the Terracotta kit from above):

quartz-1.7.0.jar - regular quartz jar
quartz-terracotta-1.0.0.jar - Terracotta clustered store

3) Whip up some scheduler code:
 public class QuartzSample {

public void startJobs() throws Exception {
Properties props = new Properties();
props.load(QuartzSample.class.getClassLoader().getResourceAsStream("org/quartz/quartz.properties"));


// **** Begin Required TC props
props.setProperty(StdSchedulerFactory.PROP_JOB_STORE_CLASS,"org.terracotta.quartz.TerracottaJobStore");
props.setProperty("org.quartz.jobStore.tcConfigUrl", "localhost:9510");
// *** End Required Terrocotta Stuff


StdSchedulerFactory factory = new StdSchedulerFactory(props);
Scheduler scheduler = factory.getScheduler();
scheduler.start();
if (scheduler.getJobDetail("myJob", "myGroup") == null) {
System.out.println("Scheduling Job!");
JobDetail jobDetail = new JobDetail("myJob", "myGroup", DumbJob.class);
Trigger trigger = TriggerUtils.makeSecondlyTrigger(5);
trigger.setName("myTrigger");
scheduler.scheduleJob(jobDetail, trigger);
} else {
System.out.println("Job Already Scheduled!");
}
}

public static class DumbJob implements Job {

@Override
public void execute(JobExecutionContext arg0) throws JobExecutionException {
System.out.println("Works baby");
}

}

public static void main(String[] args) throws Exception {
new QuartzSample().startJobs();
}
** NOTE: Notice the two lines of properties that set things up for clustering with Terracotta in the sample. That's the only difference from single node unclustered Quartz.

4) Start the Terracotta server by running start

./start-tc-server.sh

in the bin directory of the Terracotta kit

5) Run the sample code above and watch it run the job every 5 seconds. Then kill the sample app and restart it. The app will tell you that the job is already scheduled and the job will continue.

Conclusion

Two lines of configuration and a server takes you from ubiquitous job scheduler built into Spring, JBoss and Grails to scale out and persistence.

Have fun!

Friday, November 06, 2009

Welcome James House and The Quartz Community

I'm excited to be welcoming James House and Quartz to the Terracotta and Ehcache Fold. The Terracotta dev and field teams have long believed that scheduling and coordination are hugely important parts of applications from single node to scaled out architecture. In Java that leads you to one place. Quartz! We believe James and Quartz are an excellent fit with our community and suite of products that are useful from single node to the cloud.

Quartz Scheduler is both a best of bread and a ubiquitous product. We are getting right to work on contributing. The first step along the path is the Mavenization and Hudsonifaction of the Quartz project which has already been completed! We also are ready with a beta version of Quartz Terracotta Express edition. It's an HA/Durability/scale-out version of Quartz that requires no DB and is so simple it can be leveraged in minutes by existing Quartz users. We look forward to working with James to create the most usable and useful enterprise class open source scheduler available and the enterprise class support product to go with it. We have lots of great feature ideas and we look forward to the journey ahead.


Monday, October 26, 2009

What To Do If Your iTunes Gets iHacked...

Please make sure you watch your iTunes receipts/charges. In the last week or so someone somehow started using my iTunes account to purchase music illegally. Yep, I got iHACKED! Fortunately I only got 2 bills worth around 75 bucks but if I hadn't been paying attention it could have been much worse. So make sure you read those statements!

If this happens to you immediately change your password and remove your credit card information from the iTunes site/app. You do this by:
  1. Clicking on the store menu while in iTunes and then selecting the "View my account" menu item. From there click "Edit" on the main screen where it says "Payment Information" and select credit card type "None".
  2. Next change your password. In the same "View My Account" screen click "Edit Account Info." That will take you to a screen where you can change your password.

Ok, now that you've protected yourself you need to get your money back. Unlike Amazon or Zappos contacting support and reporting the fraud DOES NOT lead to a refund. They will direct you to your credit card company to refute the charge. I did this and it appears to be successful (I found it disappointing that Apple didn't just take care of this for me).

As a side note, when shopping online make sure you have a credit card company that takes responsibility for stuff. Amex has always been great and I'm sure other good companies exist as well.

UPDATE:
Looks like I'm not alone here. This blog reads like I could have written it.

5 Hints You're Using A Map When You Should Be Using a Cache?

When developing software in Java one almost always ends up with a few maps that contain keyed data. Whether it's username -> conversational state, state code -> full state name, or cached data from a database. At what point do you move from a basic Map (or one of it's more fancy variants like LinkedHashMap subclasses or ConcurrentHashMap) to an open source, light weight cache like Ehcache?

Here are 5 quick things to look for:

5) You've built your own Map loader framework for bootstrapping and/or reads triggering loading
4) You need to be able to visualize and/or control your Map via JMX or a console. For example you want to watch the hit rate of your Map or track the size of your Map.

3) You're hacking in "overflow to disk" functionality and or persistence for your Map in order to handle memory pressure and/or restartability.

2) You're hacking in special behavior to cluster Maps when you scale out. This includes things like writing your own invalidation and/or replication solution over jms or RMI

1) You find yourself implementing your own eviction strategies.


Avoid The Slippery Slope:

It's a slippery slope. First you add in one feature, then another, and next thing you know you've reinvented the cache wheel. No point in doing that. There are great caches out there that are apache licensed, light weight and have the above features you need and more.


To learn more about Ehcache check it out at ehcache.org »

Friday, October 23, 2009

Excellent Blog On Code Smells...

This is a really good/short blog that highlights some code smells that everyone should look out for. It's not a complete diary or anything but when I read it I felt like it could have been written by me.


He also has a follow on which I agree with.


Improved Web Browsing By Controlling Flash

I have a few pet annoyances when surfing the web.

* I find it disruptive to have audio play when I hit a web page (though I usually keep my sound off)
* I don't like when video plays automatically when I hit a web page.
* I don't like when my computer heats up and the battery drains when I'm not doing anything just because
I left a web page/tab open.
* Some pages that look rather slim take a disproportionately long time to load (there are lots of reasons for this but flash seems to be one of them)

Turns out that I was able to mostly solve those problems by using one of the many flash control
plugins. I surf on Safari for the most part so I went with ClickToFlash. FireFox has FlashBlock which I haven't tried.

The way it works is it shows you frames where flash usually should be. If you want to see what is there then just click on the box and it loads the real flash. It has many other nice features around content but the important one is the one I described.

I have to say, when I installed this thing I was absolutely amazed by how many things that looked like regular adds and images were actually flash. You will be astounded. Gotta wonder what these companies are doing with flash when they are showing a static image? I didn't take any official benchmarks but after installing I noticed an increase in battery life and decrease in heat on my computer. This experience makes me actually believe (didn't really at first) Apple's battery/cpu excuse for not supporting flash on the iPhone.

If your like me and only want flash when you want flash. Try it out.


Friday, October 02, 2009

Distributed Coherent EhCache In less than 5 Minutes...

Need a fast clustered/persistent cache? Ehcache, the ubiquitous cache built into Spring, JBoss and Grails can be configured to provide those features in under 5 minutes using this brief tutorial.

A Brief Digression Into The Why

Why do I need a persistent scaled out cache? The main use cases for a clustered/persistent cache are:
    • I'm using hibernate and it's pounding the database or it's too slow. Use a coherent second level cache to deflect load off the database, reduce latency without getting stale data.
    • Have a bunch of intermediate data that doesn't belong in the database and/or is expensive to store in the database that I want to keep in memory. Problem is if a node goes down or if someone asks for the data from another node the data is lost
    • I'm already caching but I have to load data over and over again into every node even though hot data for one node is hot for all (Known as the 1/n effect). If the data is cached for one node it is cached for all.

    Steps:

    1) Download the latest Ehcache www.ehcache.org

    2) Put the following jars in your class path (all included in the ehcache kit):
    ehcache-core.jar - Core Ehcache
    ehcache-terracotta.jar - Terracotta clustering
    slf4j-api-1.5.8.jar - Logging API Used by Ehcache
    slf4j-jdk14-1.5.8.jar - Implementation of the Logging API

    3) whip up some cache code:

     package org.sharrissf.samples;

    import net.sf.ehcache.Cache;
    import net.sf.ehcache.CacheManager;
    import net.sf.ehcache.Element;

    public class MyFirstEhcacheSample {
    CacheManager cacheManager = new CacheManager("src/ehcache.xml");

    public MyFirstEhcacheSample() {
    Cache cache = cacheManager.getCache("testCache");
    int cacheSize = cache.getKeys().size();
    cache.put(new Element("" + cacheSize, cacheSize));
    for (Object key : cache.getKeys()) {
    System.out.println("Key:" + key);
    }
    }

    public static void main(String[] args) throws Exception {
    new MyFirstEhcacheSample();
    }
    }



    4) Whip up some quick config

     <?xml version="1.0" encoding="UTF-8"?>

    <ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ehcache.xsd">

    <terracottaConfig url="localhost:9510" />

    <defaultCache />

    <cache name="testCache" eternal="true">
    <terracotta clustered="true"/>
    </cache>

    </ehcache>



    5) Download Terracotta

    6) Start the terracotta server in the bin directory with the start ./start-tc-server.sh

    Now just run that Java snippet a few times and see your cache grow.

    Tuesday, August 18, 2009

    Welcome EHCache Community

    I'm excited to be welcoming Greg Luck and the EHCache community to the Terracotta family. EHCache is an extremely useful/usable product and nearly ubiquitous in the caching space. Greg has spent years solving the important real world problems associated with building highly performant applications. The Terracotta Dev team is very much looking forward to helping accelerate EHCache's development as well as provide the best possible integration with the Terracotta product family.

    EHCache will remain under the Apache 2 license and we have created the beginnings of a new website at www.ehcache.org. Greg will continue to drive EHCache's vision and direction, as well as being highly involved in it's development. He will also be instrumental in helping Terracotta to define and build out our caching strategy as a whole. His vision, as well as the EHCache community's help are essential in allowing us to together take these products to the next level.

    We see a great future of product offerings for your desktop app, on your servers and in your cloud solving the scale/performance problems of today, tomorrow and beyond.

    Wednesday, August 12, 2009

    Distributed Data Structures: ConcurrentDistributedMap

    Concurrent Distributed Data Structures?

    Many challenges exist when developing a high scale multi-node application. Our goal at Terracotta is to take on those challenges in ways that remove them from the plate of those architecting and developing applications and place them squarely on our shoulders.

    In order to accomplish such a lofty goal we first had to create some core pieces of infrastructure on which many higher order abstractions could be built. One such "piece" is our ConcurrentDistributedMap. This data structure is a fundemental piece of our Distributed Cache, our Hibernate product and our Web Sessions product and is also available for use in custom solutions for those using Terracotta as a platform.


    Challenges and Tradeoffs

    Developing a data structure that is Distributed as well as Concurrent and Coherent has very different trade-offs from developing for a single JVM. If one took a standard concurrent data structure like ConcurrentHashMap and just clustered it "as is" one would likely run into performance and memory efficiency issues. Even a really cool concurrent data structure like Cliff Click's Non Blocking Hash Map would not do well if one used the algorithms without thought in a coherent cluster.

    The challenge is that the trade-offs change when you add the latency of a network and data locality in the middle of the game. In normal concurrent data structures you care about:

    - How long you hold locks
    - How much is locked while you hold it.
    - CPU usage
    - Memory Usage and Object creation

    In the clustered case you add the following:

    Lock locality - Is the lock you need already held on the local machine or do you need to go get it over the network. If you need to go get it how long does that take. While a little of the question of "How long does it take to get the lock" exists on a multi-cpu single machine it's not nearly to the same degree.

    Data locality - Is the data I need already local or do I need to go get it. If I need to get it how long does that take

    Data change rate - How much clustered data am I changing and how long does it take to send it around? Also, do I send it around?

    Data size - In a clustered world one often uses data structures that don't fit entirely in a single node. One has to take pains to control the size and amount of the data in each JVM for efficiency.

    There are other implementation specific/point in time issues like number of locks and their cost but those can mostly be optimized away at the platform level.


    Single JVM ConcurrentHashMap

    ConcurrentHashMap adds concurrency by collecting groups of entries into segments. Those segments are grouped together both from a lock perspective, they share a lock, and from a physical space perspective, all entries in a segment are generally in one collection. In a single JVM the only risk of sharing a lock between the entries is that one can contend on the in-memory speed look-ups. This is a very effective way to handle large numbers of threads making highly contended gets and puts to the map. If one runs into contention with this kind of data structures one can just up the number of segments in the Map.


    Concurrent Map In A Clustered World

    In a clustered world problems occur with a data structure like this. First, getting a lock or an object can be either in-memory speed or take many times in-memory speed depending on whether it has recently been accessed locally. In some cases this is no problem and in some cases it's pretty bad. It's also a space issue. If a segment is brought in as a whole and it's entries are in that segment strictly because of it's hashCode then the natural partitioning of the app's usage won't help save space by only loading the entries needed locally. Instead it will load the needed objects and anything else in it's segments. This elimenates the benefits of any natural or forced locality that occurs in a multi-node application.


    Use-Case Analysis

    In order to highlight some of the pro's and con's of CHM (ConcurrentHashMap) I'm going to vet it against a few use-cases.

    Use-case 1 - An 8 node app sharing a clustered ConcurrentHashMap

    All the data in the map is read only and it's used in all nodes evenly and the data fits entirely in a single JVM's heap.

    GOOD NEWS! you will be fine with a regular clustered ConcurrentHashMap. Lets look at why.

    1) All data will be loaded everywhere so unnecessary faulting (the act of pulling a data item into a node) won't be happening
    2) All locks will be read locks and will be local everywhere so your latency will be nice and low (Due to greedy locks)
    3) Won't have contention on the segments because reads are pretty much concurrent

    Use-case 2 - The same as use-case 1 but now the map data is bigger than memory and you have a sticky load balancer.

    Some good and some bad:

    1) Since data is batched into segments by hash code and your load balancer hashes on something completely different than your map hashes on you will end up loading data into each node that is not needed. This is a result of the ConcurrentHashMap segmenting strategy.

    2) Locks will still be fine because it's all read and read locks are very concurrent so segment contention won't be an issue.

    So the memory manager may be doing unnecessary work and whether you will be in trouble depends on how big the ConcurrentHashMap is

    Use-case 3 - Same as use-case 2 with the exception that now we are doing 50 percent writes. Something similar to caching conversations.

    1) Still have the above problem of loading unneeded batches
    2) But now, due to the writes, you are also maintaining the state of the objects that have unnecessarily poor locality in all the nodes where they don't belong.
    3) Now you have a locking problem. While writing an entry to a segment you are blocking people in other nodes from reading or writing to that segment adding some serious latency. Plus the locks are getting pulled around to different nodes because even though your load balancer provides locality it is on a different dimension that of the internals of the map and is therefore not helpful.

    Reviewing the problems highlighted by use case 3:

    - Lock hopping leading to slow lock retrieval
    - Lock contention due to grouping of multiple unrelated entries with locks.
    - Faulting and Memory wasting due to unfortunate segmenting of data
    - Broadcasting of changes or invalidations to nodes that shouldn't care


    What did we do?

    We built a specialty highly concurrent map tuned for distribution and the above challenges call ConcurrentDistributedMap.


    Locking:
    Instead of breaking things down into segments for locking we lock on the individual keys in the map. This gives the same correctness guarantees while giving the maximum concurrency. This drastically reduces lock hopping and contention and provides in-memory lock speeds most of the time.


    Segmenting:
    The segments go away completely. Key Value pairs are managed on an individual basis so no unnecessary faulting occurs.


    Broadcasting and invalidation:
    The above, plus an efficient memory manager means that values are only faulted into nodes where they are used. Since those values aren't in all nodes anymore invalidation and or broadcasting of changes for those entries is no longer needed.

    This data structure takes excellent advantage of any natural partitioning that may occur at the application level.


    Summary

    Building a fast, coherent, concurrent, distributed data-structure requires thinking about an extended set of concerns. However, if one pays attention to the issues it is possible to create a highly useful solution. To learn more check out the ConcurrentDistributedMap described above.


    Additional Reading:




    For more information on Terracotta's distributed data structures one can always look here:


    Tuesday, February 24, 2009

    Micro Review for Safari 4 Beta...

    I don't know what they are doing as far as tricks but HOLY COW THIS THING IS FAST!. Love the new top sites pane (like chrome), Don't care about the new tabs thing but some may. I thought Safari 3 was really fast. This puts it to shame from a perceived speed point of view. Coverflow for bookmarks is a nice touch and the smart guessing feature in the field where you enter the URL seems to be much better.

    Download it now from Apple.

    Friday, February 06, 2009

    Maven Is Cooler than you think...

    I'm sure I'm not the only one who has heard people curse Maven. But Maven is cooler than you think. Back in the day when I wanted to start a project I always had to get a whole bunch of gunk setup before I even wrote a line of code. Especially when trying a new framework or tool. Today I was whipping up a new project for a simple micro-benchmark on some Terracotta stuff and it reminded me why Maven really can be quite awesome. It took me 10 minutes and about 7 steps. The next time around I won't need to do the installs and then it's 4 steps.

    These were the steps I took to get started:

    1) Install Maven 

    2) Used the Pojo Archetype to create the build and test environment for my project.
    - Creates a Mavenized directory structure ready for build, test, run etc. Hooks up to Terracotta maven plugin as well.
    - make sure you replace the group id and project id in the command line.

    updated - with the latest eclipse plugin this is unnecessary
    X 3) In my new project directory type: "mvn eclispe:m2eclipse"
    - This takes your Maven project and readies it for eclipse

    4) Install the Maven Eclipse Plugin (I already had eclipse installed)
    - Makes dealing with Maven from eclipse much easier

    - Makes dealing with Terracotta from eclipse much easier

    6) File-> Import-> Maven projects and import your project into eclipse
    - Loads up the project directory created from the archetype into Eclipse
    7) Select the project and hit Terracotta->Add Terracotta Nature

    What you end up with here is a complete project setup ready to be built and tested from both Eclipse and the command line using Maven.

    Literally took me about 10 minutes to get started. Notice what you didn't have to do.

    1) Didn't have to build a pom.xml or other kind of build file
    2) Didn't have to download or install Terracotta or any of it's pieces
    3) Didn't have to think about your directory structures, where you want to put your tests, how you want to run those tests
    4) Didn't have to figure out how to do all this stuff in Eclipse or the commandline

    Sure, Maven can be challenging at times, but in cases like this, when the vendors have things setup for you, it can be a huge time saver.

    update:
    Looks like we've reduced the number of steps to 6 the first time and 3 after that. If we take the guy's idea about auto-applying the Terracotta Nature in archetype we could reduce it to 5 and 2.