|
|
-
Re: Proper documentation for setting up sink groupsBhaskar V. Karambelkar 2012-08-23, 22:28
My replies in line. and thanks for the detailed explanations.
On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <[EMAIL PROTECTED] > wrote: > > Please see inline. > > -- > Hari Shreedharan > > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote: > > > Hi Hari, > > Yes I did read the whole guide end to end. > > But I still have doubts > > > > The fact that multiple sinks can feed from the same channel is news to > me. I don't see it explicitly mentioned in the docs, > > so i guess I assumed wrongly, that only one sink can feed from a channel. > > > > a)Can you explain in detail , how having multiple sinks taking events > from one channel, is useful in a "fast source slow sinks" scenario ? > When multiple sinks read events from the same channel, you essentially > have as many threads taking events out, since each sink has at least one > thread. So if your source is dumping n events per second into the channel, > and your sink can only process 1 event per second, you could have n sinks > to read n events per second (this is hypothetical - your hardware and your > OS will restrict performance when the number of threads starts growing a > lot). A channel returns an event only once, how many ever sinks are taking > from the channel. Each event if removed and committed will never be given > to another sink. If there is a rollback, it is just like the event was > never taken, and a different sink will be able to take and commit it. > > > OK this makes sense. > > > > b) Also if I read your explanation below correctly there are 3 possible > cases > > > > 1) multiple sinks feeding from a single channel , with the default sink > processor this will be like a multiplexing channel with all sinks getting > all the events that come in the channel. > No, every time a take() is called from the channel, the channel will > return that event only to one sink. So each sink will get a unique > event(unless rollbacks happen - in which case the channel will put the > events back into the channel and a different sink might be able to pick it > up). > > > So this situation is exactly like a load balancing one, as events are somewhat equally distributed between all sinks ? > > > > 2) multiple sinks feeding from a single channel , with fail_over sink > processor, only one sink will get the events at a give time, with flume > failing over to next available sink in case the first one fails ? > A sink group essentially treats n sinks like one, and depending on the > criteria, will select one sink to process the next event from the channel. > In case of failover, sinks are picked in order of priority - and when one > sink fails, the next one is picked. > > > OK this makes sense. > > > > 3) multiple sinks feeding from a single channel, with load balancing > processor, with all sinks getting events in a round-robin/random order. > No, each sink will get a different event. One sink processes one event and > the next one picked will process the next event from the channel. > > Yes that's exactly what I meant, I didn't imply that all sinks get all events, but the events are distributed more or less equally among the sinks in round-robin/random order. As I said about this looks almost like #1, except here you have a control over the selection algorithm (round-robin/random) > > Is this a correct assumption ? I am aware of #2 and #3, not sure about > #1. > > > > On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan < > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > Did you read this: > http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > > > > > > That explains how to use sink groups. Also there is nothing wrong with > multiple sinks taking events from one channel. This is an especially useful > configuration if you have a very fast source and much slower sinks. > > > > > > > > > Hari > > > > > > -- > > > Hari Shreedharan > > > > > > > > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote: |