Monday, November 10, 2008

Web 2.0 Scaled Out Reference App...

It has struck me that not many good, scaled out, start to finish, Open Source Java reference applications exist for developers to learn from. Something that can demonstrate a modern and diverse stack of software handling large amounts of users.

Well check out the Examinator Web Reference App. It's goal is to demonstrate and document everything from "Build and test" to "Deployment" for a high scale, realistic, web application. It strives to be well written and as simple as possible while still being a good demonstration of best practices. It also strives to document all the relivant pieces so others can learn from it. This is quite a challenge and I'm sure dissagreement will exist on some of the choices. I think it's an excellent start and the right set of goals.

Here is a taste of the stack and tools:
  • Spring Webflow
  • Spring MVC
  • Sitemesh
  • Spring
  • Spring Security
  • Freemarker
  • Terracotta
  • JPA
  • Hibernate
  • Apache DBCP
  • MySQL
  • Apache mod_proxy
  • Tomcat
  • Jetty
  • Maven
  • Eclipse (+WTP)
  • Cargo
  • JUnit
  • HttpUnit
  • Selenium
  • Crosscheck
Click here to learn more about the choices and why they were made.

It's pretty good now and I'm pretty sure it's going to get better and better over time so check in early and often. I hope others can learn as much from reading about it and hacking on it as we did writing it.

Wednesday, September 24, 2008

JVM Wish List

I'm sitting at the JVM Summit at Sun for the next few days. It is an interesting group of about 80 people mostly represented by language implementers who target the JVM. Then their are these two guys (me and tim) who work on clustering of the JVM. Seems that our wish list for improvements to the JVM is a bit different than what others here are looking for. I asked one of the speakers about the proxify stuff below and he shot me down like I was Dick Cheney's pal on a quail hunt. Anyway, what the heck, here's my list.

Proxify - In Terracotta we dynamically swap objects in and out of the JVM and back to a server in order to create a Virtual Heap. We do this in a way that maintains Object Identity. We rely on a bevy of tricks to do this but it would be much more efficient if we could point to an object (or more than one object) and say proxify (or if a become: call existed we could use that as well, see Smalltalk for more info). All references to the original object would now be pointing to a light weight proxy. If someone touches the thing we would then inflate it back to being a full fledged object.

Array Instrumentation - Currently we instrument arrays in java by instrumenting the classes that reference the array. This is a bit messy and expensive and forces us to instrument classes that we might not otherwise need to. It also forces us to do more magic than I like to associate a shadow object with the array. Would be great if we could make this go away.

Native Method Replacement - When we want to muck with things like System.arraycopy we can't just wrap and replace because native methods get unhappy with that approach. So we have to replace at the caller. Would be nice if we didn't have to do that as much like arrays this is messy and forces us to instrument extra classes.

JVM support for meta-level monitoring of objects and code - This is not a new idea but it would be simpler and more efficient if we could ask the jvm to callback on us when things like field changes, lock acquires, field accesses occur on objects we care about. Would also be cool if we could associate meta-data with an instance (shadow objects) and ask for stats on a live instance (for our memory manager to make really good decisions on when to proxify). We do all these things now but they are complicated and more expensive then they need to be because we do them at the JRE level.

Good Hot swapping - If we can't have any of the above, good Hot swapping would also make things easier. We could decide what to instrument on the fly to simplify config a little bit and it would enable some optimizations that are very difficult to do now.

Solve the Int Size Problem - Currently arrays and all collections in Java are sized by an integer. As JVM's go more and more 64bit and people start creating collections larger than a couple billion in size it sure would be nice if they didn't bump up against this limitation.

That's it, my quick and dirty brainstormed wish list. Maybe some JVM fairy God Mother will some day grant me three wishes and bring some of these to reality.

Thursday, May 08, 2008

Who's Serious About Search...

I was chatting with someone from Yahoo last night and I mentioned that while I use Yahoo as my home page and e-mail I almost always search on Google. I never gave it much thought but a reason must exist for this to be true. I remember a LONG time ago Google had better search results and that's why I used it. I suspect that little difference exists now. So, why?

My theory is that it has little to do with the search itself and everything to do with two things:

1) Accessibility - By far the biggest reason I use Google is because it is just flat out the easiest search to get to. It is in my tool bar on both safari and firefox on my desk and in my phone. Yes I know that the others are in there too but... Does anyone actually change from the default?

If Yahoo and others are serious about search how can they hand this slot that is SO important to Google?

A second part of accessibility is more subtle. Go to and On Google's site search is what you see. It is the only thing on the page aside from some fine print. This screams "I care about search". On Yahoo's site the page is so busy you barely notice search.

2) Marketing - Google managed to get people to refer to searching as Googling! Need I say more. Does anyone even know of another name for a Band-Aid?

A couple of side points. I like yahoo's home page (old not new) a lot. I actually like yahoo's e-mail better than Googles. It is entirely possible that they have intentionally taken a more balanced approach to the web and not focused on being number one in search. I have no idea. But... if someone wants to take on Google in a serious way, be RIGHT THERE when I want you. On my IPhone, on my desktop, in my browser. I don't "think" about search. It's a tool and I mostly grab the one near my hand that fits the general problem I'm trying to solve.

Tuesday, May 06, 2008

An Optimization For Garbage Collectors...

For the last few days I have been thinking a lot about GC as Terracotta moves towards our first major rewrite of that subsystem. Lots of relatively large changes have been bouncing around in my head as I read papers, blogs and talk to people. Maybe I'll blog about those later but one pretty simple one occurred to me. I have a theory that most Shared Objects are actually only directly referenced by one parent object (I haven't run stats on this yet so I might be full of it). I started from wondering whether we could take advantage of this to improve the efficiency of GC. Here is what I came up with:

  • We can keep a Set of Object ID's for objects that only have one direct reference to it. We have an implementation of a compressed Set of ID's so this can be quite space efficient.
  • If an Object gets a second reference to it then it is removed from that Set
  • If that one reference is removed and in the Terracotta world that object is not reachable from a client or in a non-terracotta world it is not reachable from the stack then the object is garbage.
  • If an Object has no references but is reachable from the stack or is still on a client then add it to the no-refs Set so that when those two things are no longer true they can be marked as garbage or if the object is re-referenced it can be accounted for properly.
  • You can also recurse through the objects that the new garbage object referenced doing the same check.
One might be wondering, "Does steve think he just invented reference counting?" Nope, I don't and I haven't decided if this idea is any better than just having a first phase of garbage collection based solely on reference counting. I'm just theorizing that it might be. I don't even know that I invented this shortcut. Most real world GC's are hybrids of multiple approaches that best fit the set of restrictions and limitations faced in the environment. This one seems like it could improve things significantly in real world apps and potentially drastically reduce the required frequency of full GC's in our world without adding too much overhead.

Anyway, blast away :-)

Thursday, April 24, 2008

More Advice To A Young Developer...

I read Alex Miller's blog post on "Advice To A Young Developer" and it got me thinking. What general advice would I give to young developers. I came up with a few things. Hopefully they will help somebody.

1) Become your best by NOT being the best - This may seem obvious but the best way to be your best is to put your self in positions where your surrounded by people who are better than you on at least one dimension. This leads to point number 2.

2) Listen more than you talk - Use a bit of insecurity and a bit of wanting to be the best to drive you to suck the knowledge and experience from the people around you. Even the people you may view as less than you on some dimension or another will still teach you things if you listen. While your talking you are NOT listening.

3) Be Stupid, Stupid! - Sometimes being smart is a smart person's worst enemy. Don't ever use your intelligence as a crutch. Do research, talk to people, listen to people, read code, read books. In short allow yourself to evolve or risk not knowing how bad you are at things that you could have been great at. Watch out if you find yourself doing things from first principles all the time.

4) Slow Down - Of course a developer should work ones ass off. Almost all of the good ones do. But allow yourself time to think. Step back and look at what you are doing. In software it is very easy to rat hole on the wrong thing so take brakes, think, and get back to it. No more than 50 percent of your time should be spent typing (I actually think for an experienced dev it should be more like 25 percent).

5) Be A Tester - This is really for all Software Engineers, but... Be a tester, learn to write great automated tests that exercise at the unit, component, and system level. NOTE: This doesn't mean write tons and tons of tests. It means learn to write good ones.

6) Read your code - Software is complicated. Go back and read your code. You'll find all kinds of interesting things

7) Refactor - Refactor your code for readability, testability and maintainability. A myth exists that doing something in a messy poorly organized way is somehow quicker than doing it in a clean way. This is flat out untrue. I've actually timed myself. It takes longer to do things poorly. You are not saving any time by leaving bad code around. If it's a state machine write it as a clean well factored state machine. If you see a ton of nested if's extract methods, use null objects, what ever is needed.

8) Don't Guess - I first started with don't pre-optimize but I figured I would go one step further. Don't Guess! Write tests to show a perf problem before fixing them, don't guess about what features will be needed in the future, don't guess about whether your change fixes a bug. PROVE IT!

9) Trust Me - If you find yourself trying to win an argument by saying "trust me, I've done this or that before", or "I just know" then STOP. If you can't explain your point then you probably don't really have one so either figure out what the point really is or just admit your wrong.

10) Wishful thinking - Any decent list always has 10 items in it right? Anyway program by wishful thinking. Keeps you focused.

Reading back on this list it is completely all over the place. I have like 50 more ideas but this blog is already kind of preachy and pedantic so I'll stop :-). Hope someone can find some benefit from it.

Anyone else have ideas?

Wednesday, January 30, 2008

The New "My Yahoo" Homepage

I have used as my home page for a long time. Here are my comments on the new version they are pushing lately.

- I'm not thrilled with the giant square advertisement in the upper left and most important space on the page. You can move it to the right side with the layout control but it still isn't my favorite. It's giant, square and flashy and for me a bit annoying as compared to the old layout.

- The left side blocks are too wide. I like to glance at that stuff but it doesn't have a lot of text and tends to not need the extra width. It is about 50 percent wider than it was in the old version.

- I like the idea of the personal assistant thing. It has mail, stocks and some less useful stuff in a very useful spot. My problem with it is that it grows on hover. Things that take major actions on hover kind of bug me. Mostly they just make me do the action by accident and add no real value.

- I'm not sure if it's a bigger font or more white space but it feels like I have to do a lot more scrolling to see my rss feeds.

Fear not, it's not all bad, the personalization stuff while lacking enough control is visually pleasing and trivial to use.

Monday, January 28, 2008

DZone Suggestion, More Comments

I like DZone a lot. I read it regularly and post to it when practical. I like the rising links, I like the popular links, I even like the voting system. However, one thing that would make DZone just a touch better is to encourage people to comment on blogs when they vote. I personally learn from both the positive and negative feedback and anything that can be done to encourage that feedback is welcome. So, my suggestion is to pop up an optional "Add comment" field with proceed and cancel options after any vote on DZone. Anyone is free to not comment at that point but if someone has something on the tip of their tongue they might be more likely to say it. Our experience at Terracotta in building our .org website tells us that even subtle changes can have a broad impact on what people do. I would be interested to see this A/B tested and see if it increases comment volume.

Thursday, January 24, 2008

Why I love and hate statics in Java

I was chatting with some fellow geeks earlier this evening and it occurred to me that I've said to people that they should almost ALWAYS use static and also told people they should almost NEVER use static. Am I schizophrenic, a hypocrite, or just dumb. Maybe all three but it has nothing to do with this blog. I'm talking about two different language usages of the static reserved word.

USAGE 1, where the love is:
Inner classes. I hate non-static inner classes. IMHO non-static inner-class is unnecessary syntactic sugar that leads to hard to read code and subtle bugs. For those who don't know, non-static inner classes maintain a hidden instance variable holding a parent instance. It then uses specially generated methods to give access to the parent's private fields and auto-magically calls methods on the parent if no local method of the used name exists. I've seen this lead to memory leaks (people passing around instances of inner classes and not realizing that they are keeping around parents), all kinds of confusing issues with methods of the same name in inner and outer classes and variable problems of the like. On the occasions I use inner classes I almost always go with the static kind.

USAGE 2, no love here:
Static variables. With the exception of constants I have a strong dislike of static variables. Why you ask? When used to create various versions of singletons it leads to messy hidden code dependencies . It also makes it hard to do mock object stuff for testing, creates hidden initialization stuff and makes it difficult to create multiple environments in a single JVM. Just darn inflexible for no gain. I would go into details but this has been covered quite nicely here

Anyway, in summary, STATIC inner classes good, STATIC variables bad.
Goodnight ...