- Use algorithms that allow for maximum concurrency
- Seems obvious but design your algorithm to allow maximum concurrency. In a single JVM concurrency is important. In a distributed environment where lock acquire and release is almost certainly more expensive it is that much more important
- Use read locks where possible. If you can have multiple readers look at something, duh, don't stop them.
- Try strategies like striping, lock on fine grained objects etc (this is some of what concurrent hash maps do).
- Minimize chatter
- Many really cool concurrent algorithms are less good for distributed computing because of the amount of chatter (data that needs to be communicated between nodes).
- Algorithms that require too much cross node book keeping are a problem. Networks are slow and relatively thin pipes. Chattiness plays into that weakness and can also eat cpu.
- Take advantage of locality of reference
- Use algorithms that partition well
- Use algorithms that can mostly or entirely act on local data.
- Scale-out architectures don't have all the data everywhere. They rely on various levels of caching both near and far for optimal performance. When hitting data try to hit data in the same node where possible.
This has been a public service announcement.
1 comment:
I'd like to see objective and quantitative comparisons of the trade-offs involved in the different kinds of concurrency provided by languages like Java, OCaml, Erlang and F#.
Post a Comment