At LinkedIn, the most common type of failure is controlled shutdown for
code/config pushes. For that, we have a tool for reducing
the unavailability window ( This
can happen once or twice a month. The next common type of failure is
disk/raid failure, which seems to happen once every month or two. The
remaining types of failure include Linux crashes, JMV bugs, and other types
of hardware failures. They happen a few times a year.


On Tue, Jun 11, 2013 at 1:22 AM, Pankaj Misra <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB