Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Proper documentation for setting up sink groups


Copy link to this message
-
Re: Proper documentation for setting up sink groups
Bhaskar V. Karambelkar 2012-08-23, 22:28
My replies in line. and thanks for the detailed explanations.

On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

>
> Please see inline.
>
> --
> Hari Shreedharan
>
>
> On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote:
>
> > Hi Hari,
> > Yes I did read the whole guide end to end.
> > But I still have doubts
> >
> > The fact that multiple sinks can feed from the same channel is news to
> me. I don't see it explicitly mentioned in the docs,
> > so i guess I assumed wrongly, that only one sink can feed from a channel.
> >
> > a)Can you explain in detail , how having multiple sinks taking events
> from one channel, is useful in a "fast source slow sinks" scenario ?
> When multiple sinks read events from the same channel, you essentially
> have as many threads taking events out, since each sink has at least one
> thread. So if your source is dumping n events per second into the channel,
> and your sink can only process 1 event per second, you could have n sinks
> to read n events per second (this is hypothetical - your hardware and your
> OS will restrict performance when the number of threads starts growing a
> lot).  A channel returns an event only once, how many ever sinks are taking
> from the channel. Each event if removed and committed will never be given
> to another sink. If there is a rollback, it is just like the event was
> never taken, and a different sink will be able to take and commit it.
> >
>

OK this makes sense.
> >
> > b) Also if I read your explanation below correctly there are 3 possible
> cases
> >
> > 1) multiple sinks feeding from a single channel , with the default sink
> processor this will be like a multiplexing channel with all sinks getting
> all the events that come in the channel.
> No, every time a take() is called from the channel, the channel will
> return that event only to one sink. So each sink will get a unique
> event(unless rollbacks happen - in which case the channel will put the
> events back into the channel and a different sink might be able to pick it
> up).
> >
>

So this situation is exactly like a load balancing one, as events are
somewhat equally distributed between all sinks ?
> >
> > 2) multiple sinks feeding from a single channel , with fail_over sink
> processor, only one sink will get the events at a give time, with flume
> failing over to next available sink in case the first one fails ?
> A sink group essentially treats n sinks like one, and depending on the
> criteria, will select one sink to process the next event from the channel.
> In case of failover, sinks are picked in order of priority - and when one
> sink fails, the next one is picked.
> >
>

OK this makes sense.
> >
> > 3) multiple sinks feeding from a single channel, with load balancing
> processor, with all sinks getting events in a round-robin/random order.
> No, each sink will get a different event. One sink processes one event and
> the next one picked will process the next event from the channel.
>
>
Yes that's exactly what I meant, I didn't imply that all sinks get all
events, but the events are distributed more or less equally among the sinks
in round-robin/random order.
As I said about this looks almost like #1, except here you have a control
over the selection algorithm (round-robin/random)
> > Is this a correct assumption ? I am aware of #2 and #3, not sure about
> #1.
> >
> > On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan <
> [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > Did you read this:
> http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
> > >
> > > That explains how to use sink groups. Also there is nothing wrong with
> multiple sinks taking events from one channel. This is an especially useful
> configuration if you have a very fast source and much slower sinks.
> > >
> > >
> > > Hari
> > >
> > > --
> > > Hari Shreedharan
> > >
> > >
> > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote: