Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Proper documentation for setting up sink groups


+
Bhaskar V. Karambelkar 2012-08-23, 16:28
+
Hari Shreedharan 2012-08-23, 16:43
+
Bhaskar V. Karambelkar 2012-08-23, 18:43
+
Hari Shreedharan 2012-08-23, 18:57
+
Bhaskar V. Karambelkar 2012-08-23, 22:28
+
Hari Shreedharan 2012-08-23, 22:45
Copy link to this message
-
Re: Proper documentation for setting up sink groups
Some really insightful explanations Hari, thanks for the insight.
Btw, I do feel all this should be in flume user guide for the greater good
of mankind :)

On Thu, Aug 23, 2012 at 6:45 PM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> Please see inline.
>
> --
> Hari Shreedharan
>
>
> On Thursday, August 23, 2012 at 3:28 PM, Bhaskar V. Karambelkar wrote:
>
> > My replies in line. and thanks for the detailed explanations.
> >
> > On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <
> [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > >
> > > Please see inline.
> > >
> > > --
> > > Hari Shreedharan
> > >
> > >
> > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote:
> > >
> > > > Hi Hari,
> > > > Yes I did read the whole guide end to end.
> > > > But I still have doubts
> > > >
> > > > The fact that multiple sinks can feed from the same channel is news
> to me. I don't see it explicitly mentioned in the docs,
> > > > so i guess I assumed wrongly, that only one sink can feed from a
> channel.
> > > >
> > > > a)Can you explain in detail , how having multiple sinks taking
> events from one channel, is useful in a "fast source slow sinks" scenario ?
> > > When multiple sinks read events from the same channel, you essentially
> have as many threads taking events out, since each sink has at least one
> thread. So if your source is dumping n events per second into the channel,
> and your sink can only process 1 event per second, you could have n sinks
> to read n events per second (this is hypothetical - your hardware and your
> OS will restrict performance when the number of threads starts growing a
> lot). A channel returns an event only once, how many ever sinks are taking
> from the channel. Each event if removed and committed will never be given
> to another sink. If there is a rollback, it is just like the event was
> never taken, and a different sink will be able to take and commit it.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > b) Also if I read your explanation below correctly there are 3
> possible cases
> > > >
> > > > 1) multiple sinks feeding from a single channel , with the default
> sink processor this will be like a multiplexing channel with all sinks
> getting all the events that come in the channel.
> > > No, every time a take() is called from the channel, the channel will
> return that event only to one sink. So each sink will get a unique
> event(unless rollbacks happen - in which case the channel will put the
> events back into the channel and a different sink might be able to pick it
> up).
> > >
> >
> >
> > So this situation is exactly like a load balancing one, as events are
> somewhat equally distributed between all sinks ?
> Not necessarily equally distributed. Sinks poll the channel to take the
> event. If a sink is slow in polling channels then it will get fewer events,
> and if a channel is faster then that will get more events, since they are
> running on different threads.
> >
> > > >
> > > > 2) multiple sinks feeding from a single channel , with fail_over
> sink processor, only one sink will get the events at a give time, with
> flume failing over to next available sink in case the first one fails ?
> > > A sink group essentially treats n sinks like one, and depending on the
> criteria, will select one sink to process the next event from the channel.
> In case of failover, sinks are picked in order of priority - and when one
> sink fails, the next one is picked.
> > >
> >
> >
> > OK this makes sense.
> >
> > > >
> > > > 3) multiple sinks feeding from a single channel, with load balancing
> processor, with all sinks getting events in a round-robin/random order.
> > > No, each sink will get a different event. One sink processes one event
> and the next one picked will process the next event from the channel.
> >
> >
> > Yes that's exactly what I meant, I didn't imply that all sinks get all
> events, but the events are distributed more or less equally among the sinks