Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Proper documentation for setting up sink groups


Copy link to this message
-
Re: Proper documentation for setting up sink groups
Hari Shreedharan 2012-08-23, 18:57

Please see inline.

--
Hari Shreedharan
On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote:

> Hi Hari,
> Yes I did read the whole guide end to end.
> But I still have doubts
>
> The fact that multiple sinks can feed from the same channel is news to me. I don't see it explicitly mentioned in the docs,
> so i guess I assumed wrongly, that only one sink can feed from a channel.
>
> a)Can you explain in detail , how having multiple sinks taking events from one channel, is useful in a "fast source slow sinks" scenario ?
When multiple sinks read events from the same channel, you essentially have as many threads taking events out, since each sink has at least one thread. So if your source is dumping n events per second into the channel, and your sink can only process 1 event per second, you could have n sinks to read n events per second (this is hypothetical - your hardware and your OS will restrict performance when the number of threads starts growing a lot).  A channel returns an event only once, how many ever sinks are taking from the channel. Each event if removed and committed will never be given to another sink. If there is a rollback, it is just like the event was never taken, and a different sink will be able to take and commit it.
>
>
> b) Also if I read your explanation below correctly there are 3 possible cases
>
> 1) multiple sinks feeding from a single channel , with the default sink processor this will be like a multiplexing channel with all sinks getting all the events that come in the channel.
No, every time a take() is called from the channel, the channel will return that event only to one sink. So each sink will get a unique event(unless rollbacks happen - in which case the channel will put the events back into the channel and a different sink might be able to pick it up).  
>
>
> 2) multiple sinks feeding from a single channel , with fail_over sink processor, only one sink will get the events at a give time, with flume failing over to next available sink in case the first one fails ?
A sink group essentially treats n sinks like one, and depending on the criteria, will select one sink to process the next event from the channel. In case of failover, sinks are picked in order of priority - and when one sink fails, the next one is picked.
>
>
> 3) multiple sinks feeding from a single channel, with load balancing processor, with all sinks getting events in a round-robin/random order.
No, each sink will get a different event. One sink processes one event and the next one picked will process the next event from the channel.  
>
> Is this a correct assumption ? I am aware of #2 and #3, not sure about #1.
>
> On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Did you read this: http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
> >
> > That explains how to use sink groups. Also there is nothing wrong with multiple sinks taking events from one channel. This is an especially useful configuration if you have a very fast source and much slower sinks.
> >
> >
> > Hari
> >
> > --
> > Hari Shreedharan
> >
> >
> > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote:
> >
> > > The sink group document doesn't mention anything about how
> > > to hook up sink groups to the rest of the config in order to work.
> > >
> > > e.g. under normal circumstances one channel is linked with one sink.
> > >
> > > But for failover sink group , looks like both the sinks should be hooked up to the same channel,
> > > but this is not mentioned any where.
> > >
> > > Similarly, what exactly needs to be done for load balancing sink ?
> > >
> > > thanks