Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> high-level consumer design


Copy link to this message
-
Re: high-level consumer design
To avoid some consumers not consuming anything, one small trick might be
that if a consumer found itself not getting any partition, it can force a
rebalancing by deleting its own registration path and re-register in ZK.
On Mon, Jan 27, 2014 at 4:32 PM, David Birdsong <[EMAIL PROTECTED]>wrote:

> On Mon, Jan 27, 2014 at 4:19 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:
>
> > Hello David,
> >
> > One thing about using ZK locks to "own" a partition is load balancing. If
> > you are unlucky some consumer may get all the locks and some may get
> none,
> > hence have no partitions to consume.
> >
>
> I've considered this and even encountered it in testing. For our current
> load levels, we won't hurt us, but if there's a good solution, I'd rather
> codify smooth consumer balance.
>
> Got any suggestions?
>
> My thinking thus far is to establish some sort of identity on the consumer
> and derive an evenness or oddness or some modulo value that induces a small
> delay when encountering particular partition numbers. It's a hacky idea,
> but is pretty simple and might be good enough for smoothing consumers.
>
>
> > Also you may need some synchronization between the consumer thread with
> the
> > offset thread. For example, when an event is fired and the consumers need
> > to re-try grabbing the locks, it needs to first stop current fetchers,
> > commit offsets, and then start owning new partitions.
> >
>
> This is current design and what I have implemented so far. The last thread
> to exit is the offset thread and it has a direct communication channel to
> the consumer threads so it waits for those channels to be closed before
> it's last flush and exit.
>
>
> > Guozhang
> >
> >
> Thanks for the input!
>
>
> >
> > On Mon, Jan 27, 2014 at 3:03 PM, David Birdsong <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hey All, I've been cobbling together a high-level consumer for golang
> > > building on top of Shopify's Sarama package and wanted to run the basic
> > > design by the list and get some feedback or pointers on things I've
> > missed
> > > or will eventually encounter on my own.
> > >
> > > I'm using zookeeper to coordinate topic-partition owners for consumer
> > > members in each consumer group. I followed the znode layout that's
> > apparent
> > > from watching the console consumer.
> > >
> > > <consumer_root>/<consumer_group_name>/{offsets,owners,ids}.
> > >
> > > The consumer uses an outer loop to discover the partition list for a
> > given
> > > topic, attempts to grab a zookeeper lock on each (topic,partition)
> tuple,
> > > and then for each (topic, partition) it successfully locks, launches a
> > > thread (goroutine) for each partition to read the partition stream.
> > >
> > > The outer loop continues to watch for children events either of:
> > >
> > >
> >
> <consumer_root>/<consumer_group>/owners/<topic_name><kafka_root>/brokers/topics/<topic_name>/partitions
> > >
> > > ...any watch event that fires causes all offset data and consumer
> handles
> > > to be flushed and closed, goroutines watching topic-partitions exit.
> The
> > > loop is restarted.
> > >
> > > Another thread reads topic-partition-offset data and flushes the offset
> > >
> >
> to:<consumer_root>/<consumer_group>/offsets/<topic_name/<partition_number>
> > >
> > > Have I oversimplified or missed any critical steps?
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

--
-- Guozhang

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB