Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Proper documentation for setting up sink groups


Copy link to this message
-
Re: Proper documentation for setting up sink groups
Some really insightful explanations Hari, thanks for the insight.
Btw, I do feel all this should be in flume user guide for the greater good
of mankind :)

On Thu, Aug 23, 2012 at 6:45 PM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> Please see inline.
>
> --
> Hari Shreedharan
>
>
> On Thursday, August 23, 2012 at 3:28 PM, Bhaskar V. Karambelkar wrote:
>
> > My replies in line. and thanks for the detailed explanations.
> >
> > On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <
> [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > >
> > > Please see inline.
> > >
> > > --
> > > Hari Shreedharan
> > >
> > >
> > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote:
> > >
> > > > Hi Hari,
> > > > Yes I did read the whole guide end to end.
> > > > But I still have doubts
> > > >
> > > > The fact that multiple sinks can feed from the same channel is news
> to me. I don't see it explicitly mentioned in the docs,
> > > > so i guess I assumed wrongly, that only one sink can feed from a
> channel.
> > > >
> > > > a)Can you explain in detail , how having multiple sinks taking
> events from one channel, is useful in a "fast source slow sinks" scenario ?
> > > When multiple sinks read events from the same channel, you essentially
> have as many threads taking events out, since each sink has at least one
> thread. So if your source is dumping n events per second into the channel,
> and your sink can only process 1 event per second, you could have n sinks
> to read n events per second (this is hypothetical - your hardware and your
> OS will restrict performance when the number of threads starts growing a
> lot). A channel returns an event only once, how many ever sinks are taking
> from the channel. Each event if removed and committed will never be given
> to another sink. If there is a rollback, it is just like the event was
> never taken, and a different sink will be able to take and commit it.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > b) Also if I read your explanation below correctly there are 3
> possible cases
> > > >
> > > > 1) multiple sinks feeding from a single channel , with the default
> sink processor this will be like a multiplexing channel with all sinks
> getting all the events that come in the channel.
> > > No, every time a take() is called from the channel, the channel will
> return that event only to one sink. So each sink will get a unique
> event(unless rollbacks happen - in which case the channel will put the
> events back into the channel and a different sink might be able to pick it
> up).
> > >
> >
> >
> > So this situation is exactly like a load balancing one, as events are
> somewhat equally distributed between all sinks ?
> Not necessarily equally distributed. Sinks poll the channel to take the
> event. If a sink is slow in polling channels then it will get fewer events,
> and if a channel is faster then that will get more events, since they are
> running on different threads.
> >
> > > >
> > > > 2) multiple sinks feeding from a single channel , with fail_over
> sink processor, only one sink will get the events at a give time, with
> flume failing over to next available sink in case the first one fails ?
> > > A sink group essentially treats n sinks like one, and depending on the
> criteria, will select one sink to process the next event from the channel.
> In case of failover, sinks are picked in order of priority - and when one
> sink fails, the next one is picked.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > 3) multiple sinks feeding from a single channel, with load balancing
> processor, with all sinks getting events in a round-robin/random order.
> > > No, each sink will get a different event. One sink processes one event
> and the next one picked will process the next event from the channel.
> >
> >
> > Yes that's exactly what I meant, I didn't imply that all sinks get all
> events, but the events are distributed more or less equally among the sinks
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB