Tom Limoncelli's Time Management for System Administrators

I saw the talk and bought the books, but it's handy to have this link available. More useful links (to the book etc.) are on the right-hand-side of the Google Video page.

Failure recovery

I've been categorizing distributed system designs into four groups, according to how they recover from the loss of a single critical ele...