Friday, September 21, 2007

Goodbye Randy Pausch

Don't need to be a geek to learn from this one. One of the best CS professors I ever had is going too pass in the next month or so at the way to young age of 46. He has inoperable pancreatic cancer. For those who had the luck to have met or learned from him and for those who have not, this is his must see last lecture.

The Lecture
WSJ piece on him

Also check out some of his projects:

Alice
A bunch of his papers

My best to his family and friends and may his legacy live on. I only wish I could take one more class with you at the helm.

Monday, September 10, 2007

3.5 Rules in Distributed Algorithm Design

In highly concurrent distributed computing their are all kinds of algorithms one can learn. One can read books like Concurrent and Distributed Programming in Java and Distributed Systems: Principles and Paradigms. One can learn how to write scalable servers by reading things like these papers on SEDA. But when creating a distributed algorithm their are 3 simple Goals/Rules you need to follow. While these rules were developed with the Terracotta approach in mind they are mostly applicable to any distributed computing approach:

  • Use algorithms that allow for maximum concurrency
    • Seems obvious but design your algorithm to allow maximum concurrency. In a single JVM concurrency is important. In a distributed environment where lock acquire and release is almost certainly more expensive it is that much more important
    • Use read locks where possible. If you can have multiple readers look at something, duh, don't stop them.
    • Try strategies like striping, lock on fine grained objects etc (this is some of what concurrent hash maps do).
  • Minimize chatter
    • Many really cool concurrent algorithms are less good for distributed computing because of the amount of chatter (data that needs to be communicated between nodes).
    • Algorithms that require too much cross node book keeping are a problem. Networks are slow and relatively thin pipes. Chattiness plays into that weakness and can also eat cpu.
  • Take advantage of locality of reference
    • Use algorithms that partition well
    • Use algorithms that can mostly or entirely act on local data.
    • Scale-out architectures don't have all the data everywhere. They rely on various levels of caching both near and far for optimal performance. When hitting data try to hit data in the same node where possible.
Rule 3.5 exists as well. Remember not to guess at the performance of something. Test it, time it and automate as much as possible of those tests and timings.

This has been a public service announcement.