Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka 0.8 Failover Behavior


Copy link to this message
-
Re: Kafka 0.8 Failover Behavior
The broker still in ISR in ZK has all committed data.

Thanks,

Jun
On Thu, Jun 27, 2013 at 5:04 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:

> Jun,
> Does kafka provides ability to configure broker to be in in-sync before
> become availalble?
> Is it possible in case of all brokers crash to find out which node has the
> most recent data to initiate proper startup procedure?
>
> Thanks,
> Vadim
>
>
> On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Hi, Bob,
> >
> > Thanks for reporting this. Yes, this is the current behavior when all
> > brokers fail. Whichever broker comes back first becomes the new leader
> and
> > is the source of truth. This increases availability. However, previously
> > committed data can be lost. This is what we call unclean leader
> elections.
> > Another option is instead to wait until a broker in in-sync replica set
> to
> > come back before electing a new leader. This will preserve all committed
> > data at the expense of availability. The application can configure the
> > system with the appropriate option based on its need.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jun 21, 2013 at 4:08 PM, Bob Jervis <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > I wanted to send this out because we saw this in some testing we were
> > > doing and wanted to advise the community of something to watch for in
> 0.8
> > > HA support.
> > >
> > > We have a two machine cluster with replication factor 2.  We took one
> > > machine offline and re-formatted the disk.  We re-installed the Kafka
> > > software, but did not recreate any of the local disk files.  The
> > intention
> > > was to simply re-start the broker process, but due to an error in the
> > > network config that took some time to diagnose, we ended up with the
> both
> > > machines' brokers down.
> > >
> > > When we fixed the network config and restarted the brokers, we happened
> > to
> > > start the broker on the rebuilt machine first.  The net result was when
> > the
> > > healthy broker came back online, the rebuilt machine was already the
> > leader
> > > and because of the Zookeeper state, it force the healthy broker to
> delete
> > > all of its topic data, thus wiping out the entire contents of the
> > cluster.
> > >
> > > We are instituting operations procedures to safeguard against this
> > > scenario in the future (and fortunately we only blew away a test
> > cluster),
> > > but this was a bit of a nasty surprise for a Friday.
> > >
> > > Bob Jervis
> > > Visibletechnologies
> > >
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB