Wednesday, April 20, 2011

Local Caching++

Ok, so you've built your application in Java. You've used all the usual tools. Tomcat, Spring, Ehcache, Quartz, etc. Or maybe you went the JRuby, Grails or Scala route. You test your new application or hand it off to run in production and it's too slow. This is just a single node application at this point. It services 20-100 users. It's churning and burning the database, creating and recreating the same Web Pages, Users and other relevant data. You want to start caching locally to solve your latency and throughput problems. Then, upon a more detailed look at your application, you get scared.

You find your application has:
  • 40 DB tables in Hibernate that can be cached
  • A web cache
  • A user session cache
Then it hits you. Caching is easy but cache tuning is hard.

What Makes Cache Tuning Hard?

In conversations with 100's of cache users there are actually a small handful of difficult to work through challenges applying caching to an application:

  • Hibernate/Lots of caches - When using Hibernate you often end up with as many as 100 tables in your DB. How do you balance a fixed amount of resources(Heap/BigMemory) across 100 caches?
  • Indirect knobs/Bytes vs Count/TTL - In local Java caching the control points are almost always measured in number of entries and time to live. But wait a minute! When I start the JVM I don't say how many objects the heap can hold and for how long. I say how many bytes of memory the heap can use?
  • Who Tunes and When? - At some companies the desire is to have the "Application Administrator" do the tuning. At others it's the "Developer." They have different understandings of the application. The developer can tune by knowledge of the application. The app admin can only tune based on what's happening when the application is running.
These are the challenges we are working to solve in the next version of Ehcache. While it's early days on the dev side we would love feedback on our approaches. You can get a sense of how it's going to work from the doc on ehcache.org.

What We Are Building

Greg, me and the dev team spent a bunch of time pondering the above problems over the last few years. We felt that with two key improvements to how people tune we could address most of the above (and a few more items hit the rest):

  • Tune from the top - Define max resource usage for the whole cache manager and then optionally define it for the individual caches underneath it as needed. So if you have a hundred caches you can start with, "Give these 100 caches N amounts of Heap/OffHeap." Then monitor and see if any specific caches need special attention.
  • Tune the constrained resource, Bytes - TTL is a cache freshness concern not a resource management concern. Max entry count does not directly map to available heap resources. So we are adding "bytes" based tuning. This eliminates the mistake prone process of trying to control resources by TTL/TTI/count and hope you get it right. Instead you say, "I want to allow caching to use 30 percent of heap." We take it from there.

Wrapping Up

With those two key improvements a developer or admin is now directly turning the knob (Size of cache in bytes) that maps to the resources available in the JVM and doing it at a global level or a local level as needed to avoid hard to tune individual cache constraints.

This will work with all of our JVM level cache tiers (onHeap, BigMemory, Disk).

When you put those features together with other items coming in the next major release like entry and cache pinning and a snapshotting bootstrapper for warming we feel like this will be a very powerful release.

Please check out the new docs and give us feedback by commenting on this blog or posting the the Ehcache forums.

Help us make Ehcache as easy to use and powerful as we possibly can.

4 comments:

Javin Paul said...

Great post mate , you have indeed covered topic in details, Caching is indeed a topic which require a deeper understanding than many other programming concepts and you did a good job on explaining it.

Javin
How Garbage collection works in Java

Marimuthu Udayakumar said...

Really very helpful topics..


Thanks,
Marimuthu Udayakumar

Kevin S said...

How do you know how much memory an object in EhCache is consuming without serialization? Do we have to implement an API of some sort? I don't see a way to specify object size in the cache configuration.

Steve Harris said...

When you put a Cache Entry into the cache we evaluate it's size. Serialization wouldn't be helpful here because an object's serialized size is not the same as it's on heap size. It's actually a fairly complex thing to do well in Java and probably the subject of another blog (we actually have 3 mechanism we choose between depending on JVM and version).

Give it a whirl and let us know how it works for you.