|
Bhaskar V. Karambelkar
2012-08-23, 16:28
Hari Shreedharan
2012-08-23, 16:43
Bhaskar V. Karambelkar
2012-08-23, 18:43
Hari Shreedharan
2012-08-23, 18:57
Bhaskar V. Karambelkar
2012-08-23, 22:28
Hari Shreedharan
2012-08-23, 22:45
Bhaskar V. Karambelkar
2012-08-23, 22:55
|
-
Proper documentation for setting up sink groupsBhaskar V. Karambelkar 2012-08-23, 16:28
The sink group document doesn't mention anything about how
to hook up sink groups to the rest of the config in order to work. e.g. under normal circumstances one channel is linked with one sink. But for failover sink group , looks like both the sinks should be hooked up to the same channel, but this is not mentioned any where. Similarly, what exactly needs to be done for load balancing sink ? thanks
-
Re: Proper documentation for setting up sink groupsHari Shreedharan 2012-08-23, 16:43
Did you read this: http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
That explains how to use sink groups. Also there is nothing wrong with multiple sinks taking events from one channel. This is an especially useful configuration if you have a very fast source and much slower sinks. Hari -- Hari Shreedharan On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote: > The sink group document doesn't mention anything about how > to hook up sink groups to the rest of the config in order to work. > > e.g. under normal circumstances one channel is linked with one sink. > > But for failover sink group , looks like both the sinks should be hooked up to the same channel, > but this is not mentioned any where. > > Similarly, what exactly needs to be done for load balancing sink ? > > thanks
-
Re: Proper documentation for setting up sink groupsBhaskar V. Karambelkar 2012-08-23, 18:43
Hi Hari,
Yes I did read the whole guide end to end. But I still have doubts The fact that multiple sinks can feed from the same channel is news to me. I don't see it explicitly mentioned in the docs, so i guess I assumed wrongly, that only one sink can feed from a channel. a)Can you explain in detail , how having multiple sinks taking events from one channel, is useful in a "fast source slow sinks" scenario ? b) Also if I read your explanation below correctly there are 3 possible cases 1) multiple sinks feeding from a single channel , with the default sink processor this will be like a multiplexing channel with all sinks getting all the events that come in the channel. 2) multiple sinks feeding from a single channel , with fail_over sink processor, only one sink will get the events at a give time, with flume failing over to next available sink in case the first one fails ? 3) multiple sinks feeding from a single channel, with load balancing processor, with all sinks getting events in a round-robin/random order. Is this a correct assumption ? I am aware of #2 and #3, not sure about #1. On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan < [EMAIL PROTECTED]> wrote: > Did you read this: > http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > > That explains how to use sink groups. Also there is nothing wrong with > multiple sinks taking events from one channel. This is an especially useful > configuration if you have a very fast source and much slower sinks. > > > Hari > > -- > Hari Shreedharan > > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote: > > > The sink group document doesn't mention anything about how > > to hook up sink groups to the rest of the config in order to work. > > > > e.g. under normal circumstances one channel is linked with one sink. > > > > But for failover sink group , looks like both the sinks should be hooked > up to the same channel, > > but this is not mentioned any where. > > > > Similarly, what exactly needs to be done for load balancing sink ? > > > > thanks > > >
-
Re: Proper documentation for setting up sink groupsHari Shreedharan 2012-08-23, 18:57
Please see inline. -- Hari Shreedharan On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote: > Hi Hari, > Yes I did read the whole guide end to end. > But I still have doubts > > The fact that multiple sinks can feed from the same channel is news to me. I don't see it explicitly mentioned in the docs, > so i guess I assumed wrongly, that only one sink can feed from a channel. > > a)Can you explain in detail , how having multiple sinks taking events from one channel, is useful in a "fast source slow sinks" scenario ? When multiple sinks read events from the same channel, you essentially have as many threads taking events out, since each sink has at least one thread. So if your source is dumping n events per second into the channel, and your sink can only process 1 event per second, you could have n sinks to read n events per second (this is hypothetical - your hardware and your OS will restrict performance when the number of threads starts growing a lot). A channel returns an event only once, how many ever sinks are taking from the channel. Each event if removed and committed will never be given to another sink. If there is a rollback, it is just like the event was never taken, and a different sink will be able to take and commit it. > > > b) Also if I read your explanation below correctly there are 3 possible cases > > 1) multiple sinks feeding from a single channel , with the default sink processor this will be like a multiplexing channel with all sinks getting all the events that come in the channel. No, every time a take() is called from the channel, the channel will return that event only to one sink. So each sink will get a unique event(unless rollbacks happen - in which case the channel will put the events back into the channel and a different sink might be able to pick it up). > > > 2) multiple sinks feeding from a single channel , with fail_over sink processor, only one sink will get the events at a give time, with flume failing over to next available sink in case the first one fails ? A sink group essentially treats n sinks like one, and depending on the criteria, will select one sink to process the next event from the channel. In case of failover, sinks are picked in order of priority - and when one sink fails, the next one is picked. > > > 3) multiple sinks feeding from a single channel, with load balancing processor, with all sinks getting events in a round-robin/random order. No, each sink will get a different event. One sink processes one event and the next one picked will process the next event from the channel. > > Is this a correct assumption ? I am aware of #2 and #3, not sure about #1. > > On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > Did you read this: http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > > > > That explains how to use sink groups. Also there is nothing wrong with multiple sinks taking events from one channel. This is an especially useful configuration if you have a very fast source and much slower sinks. > > > > > > Hari > > > > -- > > Hari Shreedharan > > > > > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote: > > > > > The sink group document doesn't mention anything about how > > > to hook up sink groups to the rest of the config in order to work. > > > > > > e.g. under normal circumstances one channel is linked with one sink. > > > > > > But for failover sink group , looks like both the sinks should be hooked up to the same channel, > > > but this is not mentioned any where. > > > > > > Similarly, what exactly needs to be done for load balancing sink ? > > > > > > thanks
-
Re: Proper documentation for setting up sink groupsBhaskar V. Karambelkar 2012-08-23, 22:28
My replies in line. and thanks for the detailed explanations.
On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <[EMAIL PROTECTED] > wrote: > > Please see inline. > > -- > Hari Shreedharan > > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote: > > > Hi Hari, > > Yes I did read the whole guide end to end. > > But I still have doubts > > > > The fact that multiple sinks can feed from the same channel is news to > me. I don't see it explicitly mentioned in the docs, > > so i guess I assumed wrongly, that only one sink can feed from a channel. > > > > a)Can you explain in detail , how having multiple sinks taking events > from one channel, is useful in a "fast source slow sinks" scenario ? > When multiple sinks read events from the same channel, you essentially > have as many threads taking events out, since each sink has at least one > thread. So if your source is dumping n events per second into the channel, > and your sink can only process 1 event per second, you could have n sinks > to read n events per second (this is hypothetical - your hardware and your > OS will restrict performance when the number of threads starts growing a > lot). A channel returns an event only once, how many ever sinks are taking > from the channel. Each event if removed and committed will never be given > to another sink. If there is a rollback, it is just like the event was > never taken, and a different sink will be able to take and commit it. > > > OK this makes sense. > > > > b) Also if I read your explanation below correctly there are 3 possible > cases > > > > 1) multiple sinks feeding from a single channel , with the default sink > processor this will be like a multiplexing channel with all sinks getting > all the events that come in the channel. > No, every time a take() is called from the channel, the channel will > return that event only to one sink. So each sink will get a unique > event(unless rollbacks happen - in which case the channel will put the > events back into the channel and a different sink might be able to pick it > up). > > > So this situation is exactly like a load balancing one, as events are somewhat equally distributed between all sinks ? > > > > 2) multiple sinks feeding from a single channel , with fail_over sink > processor, only one sink will get the events at a give time, with flume > failing over to next available sink in case the first one fails ? > A sink group essentially treats n sinks like one, and depending on the > criteria, will select one sink to process the next event from the channel. > In case of failover, sinks are picked in order of priority - and when one > sink fails, the next one is picked. > > > OK this makes sense. > > > > 3) multiple sinks feeding from a single channel, with load balancing > processor, with all sinks getting events in a round-robin/random order. > No, each sink will get a different event. One sink processes one event and > the next one picked will process the next event from the channel. > > Yes that's exactly what I meant, I didn't imply that all sinks get all events, but the events are distributed more or less equally among the sinks in round-robin/random order. As I said about this looks almost like #1, except here you have a control over the selection algorithm (round-robin/random) > > Is this a correct assumption ? I am aware of #2 and #3, not sure about > #1. > > > > On Thu, Aug 23, 2012 at 12:43 PM, Hari Shreedharan < > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > Did you read this: > http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > > > > > > That explains how to use sink groups. Also there is nothing wrong with > multiple sinks taking events from one channel. This is an especially useful > configuration if you have a very fast source and much slower sinks. > > > > > > > > > Hari > > > > > > -- > > > Hari Shreedharan > > > > > > > > > On Thursday, August 23, 2012 at 9:28 AM, Bhaskar V. Karambelkar wrote:
-
Re: Proper documentation for setting up sink groupsHari Shreedharan 2012-08-23, 22:45
Please see inline.
-- Hari Shreedharan On Thursday, August 23, 2012 at 3:28 PM, Bhaskar V. Karambelkar wrote: > My replies in line. and thanks for the detailed explanations. > > On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > Please see inline. > > > > -- > > Hari Shreedharan > > > > > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote: > > > > > Hi Hari, > > > Yes I did read the whole guide end to end. > > > But I still have doubts > > > > > > The fact that multiple sinks can feed from the same channel is news to me. I don't see it explicitly mentioned in the docs, > > > so i guess I assumed wrongly, that only one sink can feed from a channel. > > > > > > a)Can you explain in detail , how having multiple sinks taking events from one channel, is useful in a "fast source slow sinks" scenario ? > > When multiple sinks read events from the same channel, you essentially have as many threads taking events out, since each sink has at least one thread. So if your source is dumping n events per second into the channel, and your sink can only process 1 event per second, you could have n sinks to read n events per second (this is hypothetical - your hardware and your OS will restrict performance when the number of threads starts growing a lot). A channel returns an event only once, how many ever sinks are taking from the channel. Each event if removed and committed will never be given to another sink. If there is a rollback, it is just like the event was never taken, and a different sink will be able to take and commit it. > > > > > OK this makes sense. > > > > > > > b) Also if I read your explanation below correctly there are 3 possible cases > > > > > > 1) multiple sinks feeding from a single channel , with the default sink processor this will be like a multiplexing channel with all sinks getting all the events that come in the channel. > > No, every time a take() is called from the channel, the channel will return that event only to one sink. So each sink will get a unique event(unless rollbacks happen - in which case the channel will put the events back into the channel and a different sink might be able to pick it up). > > > > > So this situation is exactly like a load balancing one, as events are somewhat equally distributed between all sinks ? Not necessarily equally distributed. Sinks poll the channel to take the event. If a sink is slow in polling channels then it will get fewer events, and if a channel is faster then that will get more events, since they are running on different threads. > > > > > > > 2) multiple sinks feeding from a single channel , with fail_over sink processor, only one sink will get the events at a give time, with flume failing over to next available sink in case the first one fails ? > > A sink group essentially treats n sinks like one, and depending on the criteria, will select one sink to process the next event from the channel. In case of failover, sinks are picked in order of priority - and when one sink fails, the next one is picked. > > > > > OK this makes sense. > > > > > > > 3) multiple sinks feeding from a single channel, with load balancing processor, with all sinks getting events in a round-robin/random order. > > No, each sink will get a different event. One sink processes one event and the next one picked will process the next event from the channel. > > > Yes that's exactly what I meant, I didn't imply that all sinks get all events, but the events are distributed more or less equally among the sinks in round-robin/random order. > As I said about this looks almost like #1, except here you have a control over the selection algorithm (round-robin/random) Not just that you have control, this will not depend on the sink's performance because all sinks are run from the same thread. So slower sinks can slow down the whole process since only one sink reads from the channel at any point in time. Think of a load balancing sink selector as a loop which picks up one sink and passes the event to that one. Since there is only one thread per sink group, having one sink group is often slower than having multiple sinks reading from the same channel.
-
Re: Proper documentation for setting up sink groupsBhaskar V. Karambelkar 2012-08-23, 22:55
Some really insightful explanations Hari, thanks for the insight.
Btw, I do feel all this should be in flume user guide for the greater good of mankind :) On Thu, Aug 23, 2012 at 6:45 PM, Hari Shreedharan <[EMAIL PROTECTED] > wrote: > Please see inline. > > -- > Hari Shreedharan > > > On Thursday, August 23, 2012 at 3:28 PM, Bhaskar V. Karambelkar wrote: > > > My replies in line. and thanks for the detailed explanations. > > > > On Thu, Aug 23, 2012 at 2:57 PM, Hari Shreedharan < > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > Please see inline. > > > > > > -- > > > Hari Shreedharan > > > > > > > > > On Thursday, August 23, 2012 at 11:43 AM, Bhaskar V. Karambelkar wrote: > > > > > > > Hi Hari, > > > > Yes I did read the whole guide end to end. > > > > But I still have doubts > > > > > > > > The fact that multiple sinks can feed from the same channel is news > to me. I don't see it explicitly mentioned in the docs, > > > > so i guess I assumed wrongly, that only one sink can feed from a > channel. > > > > > > > > a)Can you explain in detail , how having multiple sinks taking > events from one channel, is useful in a "fast source slow sinks" scenario ? > > > When multiple sinks read events from the same channel, you essentially > have as many threads taking events out, since each sink has at least one > thread. So if your source is dumping n events per second into the channel, > and your sink can only process 1 event per second, you could have n sinks > to read n events per second (this is hypothetical - your hardware and your > OS will restrict performance when the number of threads starts growing a > lot). A channel returns an event only once, how many ever sinks are taking > from the channel. Each event if removed and committed will never be given > to another sink. If there is a rollback, it is just like the event was > never taken, and a different sink will be able to take and commit it. > > > > > > > > > OK this makes sense. > > > > > > > > > > b) Also if I read your explanation below correctly there are 3 > possible cases > > > > > > > > 1) multiple sinks feeding from a single channel , with the default > sink processor this will be like a multiplexing channel with all sinks > getting all the events that come in the channel. > > > No, every time a take() is called from the channel, the channel will > return that event only to one sink. So each sink will get a unique > event(unless rollbacks happen - in which case the channel will put the > events back into the channel and a different sink might be able to pick it > up). > > > > > > > > > So this situation is exactly like a load balancing one, as events are > somewhat equally distributed between all sinks ? > Not necessarily equally distributed. Sinks poll the channel to take the > event. If a sink is slow in polling channels then it will get fewer events, > and if a channel is faster then that will get more events, since they are > running on different threads. > > > > > > > > > > 2) multiple sinks feeding from a single channel , with fail_over > sink processor, only one sink will get the events at a give time, with > flume failing over to next available sink in case the first one fails ? > > > A sink group essentially treats n sinks like one, and depending on the > criteria, will select one sink to process the next event from the channel. > In case of failover, sinks are picked in order of priority - and when one > sink fails, the next one is picked. > > > > > > > > > OK this makes sense. > > > > > > > > > > 3) multiple sinks feeding from a single channel, with load balancing > processor, with all sinks getting events in a round-robin/random order. > > > No, each sink will get a different event. One sink processes one event > and the next one picked will process the next event from the channel. > > > > > > Yes that's exactly what I meant, I didn't imply that all sinks get all > events, but the events are distributed more or less equally among the sinks |