Thursday, March 12, 2009

Terracotta release 3.0 Part 1

I'm starting to get that itch of excitement that only comes with the end of a release. The Terracotta 3.0 release includes performance improvements, bug fixes, greater platform support, 2nd gen server array, new cluster topology api, application groups, improved tools and well over 200 issues resolved. I picked two quick features to blog about first and I'll be following up with more on Terracotta 3.0 soon.

2nd Generation Server Array:

This thing rocks. The 2nd Gen Server Array stripes and mirrors ones objects across multiple servers much like RAID. The user doesn't know or care where the bytes go. The difference is that this is part of your java heap being persistently scaled over an array. I was watching a test yesterday that showed 25,000 fully persistent WRITE transactions per second. This wasn't some rosy best case scenario testing only local reads for marketing. These transactions were 2k in size fully persistent striped over 8 servers. The load that can be handled thus far has been linear. The only thing one changes to use the Server Array instead of a single server is listing the new server groups in the configuration file.

Below is a sample config for a mirrored and striped array. The topology is defined between lines 54 and 77 as mirror groups and stripes. Everything else will be the same as using regular Terracotta.

1
2 <?xml version="1.0" encoding="UTF-8"?>
3
4 <tc:tc-config xmlns:tc="http://www.terracotta.org/config">
5 <tc-properties>
6 <property name="l2.healthcheck.l1.ping.interval" value="10000"/>
7 <property name="l2.healthcheck.l1.ping.idletime" value="10000"/>
8 <property name="l2.healthcheck.l1.ping.probes" value="20"/>
9 </tc-properties>
10 <system>
11 <configuration-model>production</configuration-model>
12 </system>
13 <servers>
14 <server host="eng01" name="server1">
15 <logs>%(user.home)/local1/tc-runtime/test-app/server-logs</logs>
16 <statistics>%(user.home)/local1/tc-runtime/test-app/server-stats</statistics>
17 <dso-port>9510</dso-port>
18 <jmx-port>9520</jmx-port>
19 <l2-group-port>9530</l2-group-port>
20 <data>%(user.home)/remote/tc-runtime/test-app/server-data</data>
21 <dso>
22 <client-reconnect-window>100</client-reconnect-window>
23 <persistence>
24 <mode>permanent-store</mode>
25 </persistence>
26 <garbage-collection>
27 <enabled>true</enabled>
28 <verbose>true</verbose>
29 <interval>180</interval>
30 </garbage-collection>
31 </dso>
32 </server>
33 <server host="eng02" name="server2">
34 <logs>%(user.home)/local1/tc-runtime/test-app/server-logs</logs>
35 <statistics>%(user.home)/local1/tc-runtime/test-app/server-stats</statistics>
36 <dso-port>9510</dso-port>
37 <jmx-port>9520</jmx-port>
38 <l2-group-port>9530</l2-group-port>
39 <data>%(user.home)/remote/tc-runtime/test-app/server-data</data>
40 <dso>
41 <client-reconnect-window>100</client-reconnect-window>
42 <persistence>
43 <mode>permanent-store</mode>
44 </persistence>
45 <garbage-collection>
46 <enabled>true</enabled>
47 <verbose>true</verbose>
48 <interval>180</interval>
49 </garbage-collection>
50 </dso>
51 </server>
52
53
54 <mirror-groups>
55 <mirror-group>
56 <members>
57 <member>server1</member>
58 </members>
59 <ha>
60 <mode>networked-active-passive</mode>
61 <networked-active-passive>
62 <election-time>1</election-time>
63 </networked-active-passive>
64 </ha>
65 </mirror-group>
66 <mirror-group>
67 <members>
68 <member>server2</member>
69 </members>
70 <ha>
71 <mode>networked-active-passive</mode>
72 <networked-active-passive>
73 <election-time>1</election-time>
74 </networked-active-passive>
75 </ha>
76 </mirror-group>
77 </mirror-groups>
78 <ha>
79 <mode>networked-active-passive</mode>
80 <networked-active-passive>
81 <election-time>1</election-time>
82 </networked-active-passive>
83 </ha>
84 </servers>
85 </tc:tc-config>

Terracotta Topology API

In Terracotta 3.0 we have added a Terracotta Topology API. This is used to monitor and take action on
things like nodes joining and leaving, who am I, and is this object local. In the Terracotta Config
one needs to specify where to inject the DsoCluster topology object:

1 <application>
2     <dso>
3 <injected-instances>
4 <injected-field>
5 <field-name>org.sharrissf.sample.ClusterAPISample.cluster</field-name>
6 </injected-field>
7 </injected-instances>

Below is some sample code that takes advantage of the Topology API to find out what nodes are entering
and leaving the cluster and what my node ID is. For a fully working example of how to use this API check out the
chatter demo in the kit.

1 package org.sharrissf.sample;
2
3 import com.tc.cluster.DsoCluster;
4 import com.tc.cluster.DsoClusterEvent;
5 import com.tc.cluster.DsoClusterListener;
6
7 public class ClusterAPISample implements DsoClusterListener {
8
9 private DsoCluster cluster;
10
11 public ClusterAPISample() {
12 System.out
13 .println("***** Hello, I'm node: " + cluster.getCurrentNode());
14 cluster.addClusterListener(this);
15 try {
16 if ("ClientID[0]".equals(cluster.getCurrentNode().getId()))
17 Thread.sleep(2000);
18 else
19 Thread.sleep(10000);
20 } catch (InterruptedException e) {
21 // TODO Auto-generated catch block
22 e.printStackTrace();
23 }
24 }
25
26 public void nodeJoined(DsoClusterEvent event) {
27 System.out.println("***** NODE JOINED: " + event.getNode());
28 }
29
30 public void nodeLeft(DsoClusterEvent event) {
31 System.out.println("***** NODE LEFT: " + event.getNode());
32 }
33
34 public void operationsDisabled(DsoClusterEvent event) {
35 System.out.println("***** OPERATIONS DISABLED: " + event.getNode());
36 }
37
38 public void operationsEnabled(DsoClusterEvent event) {
39 System.out.println("***** OPERATIONS ENABLED: " + event.getNode());
40 }
41
42 public final static void main(String[] args) throws Exception {
43 new ClusterAPISample();
44 }
45 }

I put the sleeps in so that one node prints out when the other node leaves if you run this twice.

I'll highlight some of the other new stuff in a follow on post.

Tuesday, February 24, 2009

Micro Review for Safari 4 Beta...

I don't know what they are doing as far as tricks but HOLY COW THIS THING IS FAST!. Love the new top sites pane (like chrome), Don't care about the new tabs thing but some may. I thought Safari 3 was really fast. This puts it to shame from a perceived speed point of view. Coverflow for bookmarks is a nice touch and the smart guessing feature in the field where you enter the URL seems to be much better.

Download it now from Apple.

Friday, February 06, 2009

Maven Is Cooler than you think...

I'm sure I'm not the only one who has heard people curse Maven. But Maven is cooler than you think. Back in the day when I wanted to start a project I always had to get a whole bunch of gunk setup before I even wrote a line of code. Especially when trying a new framework or tool. Today I was whipping up a new project for a simple micro-benchmark on some Terracotta stuff and it reminded me why Maven really can be quite awesome. It took me 10 minutes and about 7 steps. The next time around I won't need to do the installs and then it's 4 steps.

These were the steps I took to get started:

1) Install Maven 

2) Used the Pojo Archetype to create the build and test environment for my project.
- Creates a Mavenized directory structure ready for build, test, run etc. Hooks up to Terracotta maven plugin as well.
- make sure you replace the group id and project id in the command line.

updated - with the latest eclipse plugin this is unnecessary
X 3) In my new project directory type: "mvn eclispe:m2eclipse"
- This takes your Maven project and readies it for eclipse

4) Install the Maven Eclipse Plugin (I already had eclipse installed)
- Makes dealing with Maven from eclipse much easier

- Makes dealing with Terracotta from eclipse much easier

6) File-> Import-> Maven projects and import your project into eclipse
- Loads up the project directory created from the archetype into Eclipse
7) Select the project and hit Terracotta->Add Terracotta Nature

What you end up with here is a complete project setup ready to be built and tested from both Eclipse and the command line using Maven.

Literally took me about 10 minutes to get started. Notice what you didn't have to do.

1) Didn't have to build a pom.xml or other kind of build file
2) Didn't have to download or install Terracotta or any of it's pieces
3) Didn't have to think about your directory structures, where you want to put your tests, how you want to run those tests
4) Didn't have to figure out how to do all this stuff in Eclipse or the commandline

Sure, Maven can be challenging at times, but in cases like this, when the vendors have things setup for you, it can be a huge time saver.

update:
Looks like we've reduced the number of steps to 6 the first time and 3 after that. If we take the guy's idea about auto-applying the Terracotta Nature in archetype we could reduce it to 5 and 2.

Monday, November 10, 2008

Web 2.0 Scaled Out Reference App...

It has struck me that not many good, scaled out, start to finish, Open Source Java reference applications exist for developers to learn from. Something that can demonstrate a modern and diverse stack of software handling large amounts of users.

Well check out the Examinator Web Reference App. It's goal is to demonstrate and document everything from "Build and test" to "Deployment" for a high scale, realistic, web application. It strives to be well written and as simple as possible while still being a good demonstration of best practices. It also strives to document all the relivant pieces so others can learn from it. This is quite a challenge and I'm sure dissagreement will exist on some of the choices. I think it's an excellent start and the right set of goals.

Here is a taste of the stack and tools:
  • Spring Webflow
  • Spring MVC
  • Sitemesh
  • Spring
  • Spring Security
  • Freemarker
  • Terracotta
  • JPA
  • Hibernate
  • Apache DBCP
  • MySQL
  • Apache mod_proxy
  • Tomcat
  • Jetty
  • Maven
  • Eclipse (+WTP)
  • Cargo
  • JUnit
  • HttpUnit
  • Selenium
  • Crosscheck
Click here to learn more about the choices and why they were made.

It's pretty good now and I'm pretty sure it's going to get better and better over time so check in early and often. I hope others can learn as much from reading about it and hacking on it as we did writing it.


Wednesday, September 24, 2008

JVM Wish List

I'm sitting at the JVM Summit at Sun for the next few days. It is an interesting group of about 80 people mostly represented by language implementers who target the JVM. Then their are these two guys (me and tim) who work on clustering of the JVM. Seems that our wish list for improvements to the JVM is a bit different than what others here are looking for. I asked one of the speakers about the proxify stuff below and he shot me down like I was Dick Cheney's pal on a quail hunt. Anyway, what the heck, here's my list.

Proxify - In Terracotta we dynamically swap objects in and out of the JVM and back to a server in order to create a Virtual Heap. We do this in a way that maintains Object Identity. We rely on a bevy of tricks to do this but it would be much more efficient if we could point to an object (or more than one object) and say proxify (or if a become: call existed we could use that as well, see Smalltalk for more info). All references to the original object would now be pointing to a light weight proxy. If someone touches the thing we would then inflate it back to being a full fledged object.

Array Instrumentation - Currently we instrument arrays in java by instrumenting the classes that reference the array. This is a bit messy and expensive and forces us to instrument classes that we might not otherwise need to. It also forces us to do more magic than I like to associate a shadow object with the array. Would be great if we could make this go away.

Native Method Replacement - When we want to muck with things like System.arraycopy we can't just wrap and replace because native methods get unhappy with that approach. So we have to replace at the caller. Would be nice if we didn't have to do that as much like arrays this is messy and forces us to instrument extra classes.

JVM support for meta-level monitoring of objects and code - This is not a new idea but it would be simpler and more efficient if we could ask the jvm to callback on us when things like field changes, lock acquires, field accesses occur on objects we care about. Would also be cool if we could associate meta-data with an instance (shadow objects) and ask for stats on a live instance (for our memory manager to make really good decisions on when to proxify). We do all these things now but they are complicated and more expensive then they need to be because we do them at the JRE level.

Good Hot swapping - If we can't have any of the above, good Hot swapping would also make things easier. We could decide what to instrument on the fly to simplify config a little bit and it would enable some optimizations that are very difficult to do now.

Solve the Int Size Problem - Currently arrays and all collections in Java are sized by an integer. As JVM's go more and more 64bit and people start creating collections larger than a couple billion in size it sure would be nice if they didn't bump up against this limitation.

That's it, my quick and dirty brainstormed wish list. Maybe some JVM fairy God Mother will some day grant me three wishes and bring some of these to reality.




Thursday, May 08, 2008

Who's Serious About Search...

I was chatting with someone from Yahoo last night and I mentioned that while I use Yahoo as my home page and e-mail I almost always search on Google. I never gave it much thought but a reason must exist for this to be true. I remember a LONG time ago Google had better search results and that's why I used it. I suspect that little difference exists now. So, why?

My theory is that it has little to do with the search itself and everything to do with two things:

1) Accessibility - By far the biggest reason I use Google is because it is just flat out the easiest search to get to. It is in my tool bar on both safari and firefox on my desk and in my phone. Yes I know that the others are in there too but... Does anyone actually change from the default?

Question:
If Yahoo and others are serious about search how can they hand this slot that is SO important to Google?

A second part of accessibility is more subtle. Go to www.google.com and www.yahoo.com. On Google's site search is what you see. It is the only thing on the page aside from some fine print. This screams "I care about search". On Yahoo's site the page is so busy you barely notice search.

2) Marketing - Google managed to get people to refer to searching as Googling! Need I say more. Does anyone even know of another name for a Band-Aid?

A couple of side points. I like yahoo's home page (old not new) a lot. I actually like yahoo's e-mail better than Googles. It is entirely possible that they have intentionally taken a more balanced approach to the web and not focused on being number one in search. I have no idea. But... if someone wants to take on Google in a serious way, be RIGHT THERE when I want you. On my IPhone, on my desktop, in my browser. I don't "think" about search. It's a tool and I mostly grab the one near my hand that fits the general problem I'm trying to solve.

Tuesday, May 06, 2008

An Optimization For Garbage Collectors...

For the last few days I have been thinking a lot about GC as Terracotta moves towards our first major rewrite of that subsystem. Lots of relatively large changes have been bouncing around in my head as I read papers, blogs and talk to people. Maybe I'll blog about those later but one pretty simple one occurred to me. I have a theory that most Shared Objects are actually only directly referenced by one parent object (I haven't run stats on this yet so I might be full of it). I started from wondering whether we could take advantage of this to improve the efficiency of GC. Here is what I came up with:

  • We can keep a Set of Object ID's for objects that only have one direct reference to it. We have an implementation of a compressed Set of ID's so this can be quite space efficient.
  • If an Object gets a second reference to it then it is removed from that Set
  • If that one reference is removed and in the Terracotta world that object is not reachable from a client or in a non-terracotta world it is not reachable from the stack then the object is garbage.
  • If an Object has no references but is reachable from the stack or is still on a client then add it to the no-refs Set so that when those two things are no longer true they can be marked as garbage or if the object is re-referenced it can be accounted for properly.
  • You can also recurse through the objects that the new garbage object referenced doing the same check.
One might be wondering, "Does steve think he just invented reference counting?" Nope, I don't and I haven't decided if this idea is any better than just having a first phase of garbage collection based solely on reference counting. I'm just theorizing that it might be. I don't even know that I invented this shortcut. Most real world GC's are hybrids of multiple approaches that best fit the set of restrictions and limitations faced in the environment. This one seems like it could improve things significantly in real world apps and potentially drastically reduce the required frequency of full GC's in our world without adding too much overhead.

Anyway, blast away :-)