Thursday, May 08, 2008

Who's Serious About Search...

I was chatting with someone from Yahoo last night and I mentioned that while I use Yahoo as my home page and e-mail I almost always search on Google. I never gave it much thought but a reason must exist for this to be true. I remember a LONG time ago Google had better search results and that's why I used it. I suspect that little difference exists now. So, why?

My theory is that it has little to do with the search itself and everything to do with two things:

1) Accessibility - By far the biggest reason I use Google is because it is just flat out the easiest search to get to. It is in my tool bar on both safari and firefox on my desk and in my phone. Yes I know that the others are in there too but... Does anyone actually change from the default?

If Yahoo and others are serious about search how can they hand this slot that is SO important to Google?

A second part of accessibility is more subtle. Go to and On Google's site search is what you see. It is the only thing on the page aside from some fine print. This screams "I care about search". On Yahoo's site the page is so busy you barely notice search.

2) Marketing - Google managed to get people to refer to searching as Googling! Need I say more. Does anyone even know of another name for a Band-Aid?

A couple of side points. I like yahoo's home page (old not new) a lot. I actually like yahoo's e-mail better than Googles. It is entirely possible that they have intentionally taken a more balanced approach to the web and not focused on being number one in search. I have no idea. But... if someone wants to take on Google in a serious way, be RIGHT THERE when I want you. On my IPhone, on my desktop, in my browser. I don't "think" about search. It's a tool and I mostly grab the one near my hand that fits the general problem I'm trying to solve.

Tuesday, May 06, 2008

An Optimization For Garbage Collectors...

For the last few days I have been thinking a lot about GC as Terracotta moves towards our first major rewrite of that subsystem. Lots of relatively large changes have been bouncing around in my head as I read papers, blogs and talk to people. Maybe I'll blog about those later but one pretty simple one occurred to me. I have a theory that most Shared Objects are actually only directly referenced by one parent object (I haven't run stats on this yet so I might be full of it). I started from wondering whether we could take advantage of this to improve the efficiency of GC. Here is what I came up with:

  • We can keep a Set of Object ID's for objects that only have one direct reference to it. We have an implementation of a compressed Set of ID's so this can be quite space efficient.
  • If an Object gets a second reference to it then it is removed from that Set
  • If that one reference is removed and in the Terracotta world that object is not reachable from a client or in a non-terracotta world it is not reachable from the stack then the object is garbage.
  • If an Object has no references but is reachable from the stack or is still on a client then add it to the no-refs Set so that when those two things are no longer true they can be marked as garbage or if the object is re-referenced it can be accounted for properly.
  • You can also recurse through the objects that the new garbage object referenced doing the same check.
One might be wondering, "Does steve think he just invented reference counting?" Nope, I don't and I haven't decided if this idea is any better than just having a first phase of garbage collection based solely on reference counting. I'm just theorizing that it might be. I don't even know that I invented this shortcut. Most real world GC's are hybrids of multiple approaches that best fit the set of restrictions and limitations faced in the environment. This one seems like it could improve things significantly in real world apps and potentially drastically reduce the required frequency of full GC's in our world without adding too much overhead.

Anyway, blast away :-)