At LinkedIn, the most common type of failure is controlled shutdown for
code/config pushes. For that, we have a tool for reducing
the unavailability window (https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools)
can happen once or twice a month. The next common type of failure is
disk/raid failure, which seems to happen once every month or two. The
remaining types of failure include Linux crashes, JMV bugs, and other types
of hardware failures. They happen a few times a year.
On Tue, Jun 11, 2013 at 1:22 AM, Pankaj Misra <[EMAIL PROTECTED]>wrote: