|
Wang, Yongkun | Yongkun |...
2012-08-10, 05:07
Denny Ye
2012-08-10, 05:29
Wang, Yongkun | Yongkun |...
2012-08-10, 05:48
Denny Ye
2012-08-10, 06:09
Wang, Yongkun | Yongkun |...
2012-08-10, 06:25
Denny Ye
2012-08-10, 06:58
Wang, Yongkun | Yongkun |...
2012-08-10, 07:53
Juhani Connolly
2012-08-10, 09:07
Wang, Yongkun | Yongkun |...
2012-08-11, 04:30
Mike Percy
2012-08-13, 00:17
Wang, Yongkun | Yongkun |...
2012-08-15, 03:03
Mike Percy
2012-08-15, 03:52
|
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-10, 05:07
JIRA is down again? I cannot connect to it and comment there.
I have a proposal in "Transactional Multiplex (fan out) Sink"): https://issues.apache.org/jira/browse/FLUME-1435 Which contains the design of one channel to multiple sinks. You can search the email since JIRA cannot be accessed. I think this is more than a configuration issue. If simply enable several sinks on the same channel, they will take it either in a round-robin mode or in a unpredictable mode if the speed of sinks are different. So it's better to have a even higher level transaction control instead of the transaction in the process() of each sink, as I describe in FLUME-1435. Regards, Yongkun Wang On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: >Denny Ye created FLUME-1479: >------------------------------- > > Summary: Multiple Sinks can connect to single Channel > Key: FLUME-1479 > URL: https://issues.apache.org/jira/browse/FLUME-1479 > Project: Flume > Issue Type: Bug > Components: Configuration > Affects Versions: v1.2.0 > Reporter: Denny Ye > Assignee: Denny Ye > Fix For: v1.3.0 > > >If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be >connected with each other with configuration example >{quote} >agent.sinks.hsa.channel = mc >agent.sinks.hsb.channel = mc >{quote} >It means that there have multiple Sinks can connect to single Channel. >Normally, one Sink only can connect to unified Channel > >-- >This message is automatically generated by JIRA. >If you think it was sent incorrectly, please contact your JIRA >administrators: >https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelDenny Ye 2012-08-10, 05:29
hi Yongkun,
JIRA can be accessed now. I think it might be difficult to understand the order of events from your thought. If we don't care about the order, can discuss the value and feasibility. In my opinion, data ingest flow is order unawareness, at least, not such important for us. You can try to verify your proposal and give us result. It may be some difficulties in keeping transaction with several Sinks. -Regards Denny Ye 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > JIRA is down again? I cannot connect to it and comment there. > > I have a proposal in "Transactional Multiplex (fan out) Sink"): > https://issues.apache.org/jira/browse/FLUME-1435 > Which contains the design of one channel to multiple sinks. > > You can search the email since JIRA cannot be accessed. > > I think this is more than a configuration issue. If simply enable several > sinks on the same channel, they will take it either in a round-robin mode > or in a unpredictable mode if the speed of sinks are different. > > So it's better to have a even higher level transaction control instead of > the transaction in the process() of each sink, as I describe in FLUME-1435. > > Regards, > Yongkun Wang > > > On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: > > >Denny Ye created FLUME-1479: > >------------------------------- > > > > Summary: Multiple Sinks can connect to single Channel > > Key: FLUME-1479 > > URL: https://issues.apache.org/jira/browse/FLUME-1479 > > Project: Flume > > Issue Type: Bug > > Components: Configuration > > Affects Versions: v1.2.0 > > Reporter: Denny Ye > > Assignee: Denny Ye > > Fix For: v1.3.0 > > > > > >If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be > >connected with each other with configuration example > >{quote} > >agent.sinks.hsa.channel = mc > >agent.sinks.hsb.channel = mc > >{quote} > >It means that there have multiple Sinks can connect to single Channel. > >Normally, one Sink only can connect to unified Channel > > > >-- > >This message is automatically generated by JIRA. > >If you think it was sent incorrectly, please contact your JIRA > >administrators: > >https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > >For more information on JIRA, see: http://www.atlassian.com/software/jira > > > > > > > > >
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-10, 05:48
Hi Denny,
I am working on the patch now, it's not difficult. I have listed the changes in that JIRA. I think you misunderstand my design, I didn't maintain the order of the events. Instead I make sure that each sink will get the same events (or different events specified by selector). Suppose Channel (mc) contains the following events: 4,3,2,1 If simply enable it by configuration, it may work like this: Sink "hsa" may get 1,3; Sink "hsb" may get 2,4; So different sink will get different data. Is this what user wants? In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical case when user want to fan-out the data into two places (eg. One for batch and and another for real-time analysis). Regards, Yongkun Wang On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: >hi Yongkun, > > JIRA can be accessed now. > > I think it might be difficult to understand the order of events from >your thought. If we don't care about the order, can discuss the value and >feasibility. In my opinion, data ingest flow is order unawareness, at >least, not such important for us. You can try to verify your proposal and >give us result. It may be some difficulties in keeping transaction with >several Sinks. > >-Regards >Denny Ye > > >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > >> JIRA is down again? I cannot connect to it and comment there. >> >> I have a proposal in "Transactional Multiplex (fan out) Sink"): >> https://issues.apache.org/jira/browse/FLUME-1435 >> Which contains the design of one channel to multiple sinks. >> >> You can search the email since JIRA cannot be accessed. >> >> I think this is more than a configuration issue. If simply enable >>several >> sinks on the same channel, they will take it either in a round-robin >>mode >> or in a unpredictable mode if the speed of sinks are different. >> >> So it's better to have a even higher level transaction control instead >>of >> the transaction in the process() of each sink, as I describe in >>FLUME-1435. >> >> Regards, >> Yongkun Wang >> >> >> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: >> >> >Denny Ye created FLUME-1479: >> >------------------------------- >> > >> > Summary: Multiple Sinks can connect to single Channel >> > Key: FLUME-1479 >> > URL: https://issues.apache.org/jira/browse/FLUME-1479 >> > Project: Flume >> > Issue Type: Bug >> > Components: Configuration >> > Affects Versions: v1.2.0 >> > Reporter: Denny Ye >> > Assignee: Denny Ye >> > Fix For: v1.3.0 >> > >> > >> >If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be >> >connected with each other with configuration example >> >{quote} >> >agent.sinks.hsa.channel = mc >> >agent.sinks.hsb.channel = mc >> >{quote} >> >It means that there have multiple Sinks can connect to single Channel. >> >Normally, one Sink only can connect to unified Channel >> > >> >-- >> >This message is automatically generated by JIRA. >> >If you think it was sent incorrectly, please contact your JIRA >> >administrators: >> >>>https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> >For more information on JIRA, see: >>http://www.atlassian.com/software/jira >> > >> > >> > >> >> >>
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelDenny Ye 2012-08-10, 06:09
Yongkun,
Now, I understand your design. Thanks for your interpretation. I have two questions, please help to explain, thanks! 1. Two Sinks have different consuming rate. If Channel have 1000 events, sinkA consumed 800 events and sinkB consumed 100 events. When we remove totally consumed events from Channel? 2. Exception happened at one Sink. Each Sink retrieve 100 events from Channel, and exception happening at sinkA. sinkA should rollback. What's the detailed activity in your thought? -Regards Denny Ye 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > Hi Denny, > > I am working on the patch now, it's not difficult. I have listed the > changes in that JIRA. > I think you misunderstand my design, I didn't maintain the order of the > events. Instead I make sure that each sink will get the same events (or > different events specified by selector). > > Suppose Channel (mc) contains the following events: 4,3,2,1 > > If simply enable it by configuration, it may work like this: > Sink "hsa" may get 1,3; > Sink "hsb" may get 2,4; > So different sink will get different data. Is this what user wants? > > > In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical > case when user want to fan-out the data into two places (eg. One for batch > and and another for real-time analysis). > > Regards, > Yongkun Wang > > > On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: > > >hi Yongkun, > > > > JIRA can be accessed now. > > > > I think it might be difficult to understand the order of events from > >your thought. If we don't care about the order, can discuss the value and > >feasibility. In my opinion, data ingest flow is order unawareness, at > >least, not such important for us. You can try to verify your proposal and > >give us result. It may be some difficulties in keeping transaction with > >several Sinks. > > > >-Regards > >Denny Ye > > > > > >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > > > >> JIRA is down again? I cannot connect to it and comment there. > >> > >> I have a proposal in "Transactional Multiplex (fan out) Sink"): > >> https://issues.apache.org/jira/browse/FLUME-1435 > >> Which contains the design of one channel to multiple sinks. > >> > >> You can search the email since JIRA cannot be accessed. > >> > >> I think this is more than a configuration issue. If simply enable > >>several > >> sinks on the same channel, they will take it either in a round-robin > >>mode > >> or in a unpredictable mode if the speed of sinks are different. > >> > >> So it's better to have a even higher level transaction control instead > >>of > >> the transaction in the process() of each sink, as I describe in > >>FLUME-1435. > >> > >> Regards, > >> Yongkun Wang > >> > >> > >> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: > >> > >> >Denny Ye created FLUME-1479: > >> >------------------------------- > >> > > >> > Summary: Multiple Sinks can connect to single Channel > >> > Key: FLUME-1479 > >> > URL: https://issues.apache.org/jira/browse/FLUME-1479 > >> > Project: Flume > >> > Issue Type: Bug > >> > Components: Configuration > >> > Affects Versions: v1.2.0 > >> > Reporter: Denny Ye > >> > Assignee: Denny Ye > >> > Fix For: v1.3.0 > >> > > >> > > >> >If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be > >> >connected with each other with configuration example > >> >{quote} > >> >agent.sinks.hsa.channel = mc > >> >agent.sinks.hsb.channel = mc > >> >{quote} > >> >It means that there have multiple Sinks can connect to single Channel. > >> >Normally, one Sink only can connect to unified Channel > >> > > >> >-- > >> >This message is automatically generated by JIRA. > >> >If you think it was sent incorrectly, please contact your JIRA > >> >administrators: > >> > >>> > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-10, 06:25
Hi Denny,
Thanks for the questions. Answers inline. On 12/08/10 15:09, "Denny Ye" <[EMAIL PROTECTED]> wrote: >Yongkun, > Now, I understand your design. Thanks for your interpretation. > I have two questions, please help to explain, thanks! > 1. Two Sinks have different consuming rate. If Channel have 1000 >events, sinkA consumed 800 events and sinkB consumed 100 events. When we >remove totally consumed events from Channel? In my design, I try to avoid this case, which means SinkA and SinkB will be synchronized and both get 1000 events if the mode is replicating. In my design, the event is not removed by Sink (call channel.take() in process() of sink), instead events are removed by high level sink processor, who will remove the event once sinks satisfy the transaction requirements. > 2. Exception happened at one Sink. Each Sink retrieve 100 events from >Channel, and exception happening at sinkA. sinkA should rollback. What's >the detailed activity in your thought? Yes, transaction control on multiple sinks is more complicated. In my design, I have two policies to commit a multi-sink transaction (suppose we have N sinks): - When M(0=<M<=N) Sinks succeed, commit; e.g. value for M: ANY, ONE, QUARUM, ALL - When specified M(0<M<=N) Sinks (important sinks) succeed, commit; - otherwise, rollback all sinks for current event. Regards, Yongkun > >-Regards >Denny Ye > >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > >> Hi Denny, >> >> I am working on the patch now, it's not difficult. I have listed the >> changes in that JIRA. >> I think you misunderstand my design, I didn't maintain the order of the >> events. Instead I make sure that each sink will get the same events (or >> different events specified by selector). >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 >> >> If simply enable it by configuration, it may work like this: >> Sink "hsa" may get 1,3; >> Sink "hsb" may get 2,4; >> So different sink will get different data. Is this what user wants? >> >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical >> case when user want to fan-out the data into two places (eg. One for >>batch >> and and another for real-time analysis). >> >> Regards, >> Yongkun Wang >> >> >> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: >> >> >hi Yongkun, >> > >> > JIRA can be accessed now. >> > >> > I think it might be difficult to understand the order of events from >> >your thought. If we don't care about the order, can discuss the value >>and >> >feasibility. In my opinion, data ingest flow is order unawareness, at >> >least, not such important for us. You can try to verify your proposal >>and >> >give us result. It may be some difficulties in keeping transaction with >> >several Sinks. >> > >> >-Regards >> >Denny Ye >> > >> > >> >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> >> > >> >> JIRA is down again? I cannot connect to it and comment there. >> >> >> >> I have a proposal in "Transactional Multiplex (fan out) Sink"): >> >> https://issues.apache.org/jira/browse/FLUME-1435 >> >> Which contains the design of one channel to multiple sinks. >> >> >> >> You can search the email since JIRA cannot be accessed. >> >> >> >> I think this is more than a configuration issue. If simply enable >> >>several >> >> sinks on the same channel, they will take it either in a round-robin >> >>mode >> >> or in a unpredictable mode if the speed of sinks are different. >> >> >> >> So it's better to have a even higher level transaction control >>instead >> >>of >> >> the transaction in the process() of each sink, as I describe in >> >>FLUME-1435. >> >> >> >> Regards, >> >> Yongkun Wang >> >> >> >> >> >> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: >> >> >> >> >Denny Ye created FLUME-1479: >> >> >------------------------------- >> >> > >> >> > Summary: Multiple Sinks can connect to single Channel >> >> > Key: FLUME-1479
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelDenny Ye 2012-08-10, 06:58
hi Yongkun,
OK, you have chosen most important baseline with similar consuming rate for each Sink. Regularly, it's impossible in fact. Slowest Sink will be limitation or bottleneck in your design. If my first question is becoming false case, I think you provide simplified rollback model. Do you agree me? -Regards Denny Ye 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > Hi Denny, > > Thanks for the questions. Answers inline. > > On 12/08/10 15:09, "Denny Ye" <[EMAIL PROTECTED]> wrote: > > >Yongkun, > > Now, I understand your design. Thanks for your interpretation. > > I have two questions, please help to explain, thanks! > > 1. Two Sinks have different consuming rate. If Channel have 1000 > >events, sinkA consumed 800 events and sinkB consumed 100 events. When we > >remove totally consumed events from Channel? > > In my design, I try to avoid this case, which means SinkA and SinkB will > be synchronized and both get 1000 events if the mode is replicating. In my > design, the event is not removed by Sink (call channel.take() in process() > of sink), instead events are removed by high level sink processor, who > will remove the event once sinks satisfy the transaction requirements. > > > 2. Exception happened at one Sink. Each Sink retrieve 100 events from > >Channel, and exception happening at sinkA. sinkA should rollback. What's > >the detailed activity in your thought? > > Yes, transaction control on multiple sinks is more complicated. In my > design, I have two policies to commit a multi-sink transaction (suppose we > have N sinks): > > - When M(0=<M<=N) Sinks succeed, commit; e.g. value for M: ANY, ONE, > QUARUM, ALL > - When specified M(0<M<=N) Sinks (important sinks) succeed, commit; > - otherwise, rollback all sinks for current event. > > > Regards, > Yongkun > > > > >-Regards > >Denny Ye > > > >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > > > >> Hi Denny, > >> > >> I am working on the patch now, it's not difficult. I have listed the > >> changes in that JIRA. > >> I think you misunderstand my design, I didn't maintain the order of the > >> events. Instead I make sure that each sink will get the same events (or > >> different events specified by selector). > >> > >> Suppose Channel (mc) contains the following events: 4,3,2,1 > >> > >> If simply enable it by configuration, it may work like this: > >> Sink "hsa" may get 1,3; > >> Sink "hsb" may get 2,4; > >> So different sink will get different data. Is this what user wants? > >> > >> > >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical > >> case when user want to fan-out the data into two places (eg. One for > >>batch > >> and and another for real-time analysis). > >> > >> Regards, > >> Yongkun Wang > >> > >> > >> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: > >> > >> >hi Yongkun, > >> > > >> > JIRA can be accessed now. > >> > > >> > I think it might be difficult to understand the order of events from > >> >your thought. If we don't care about the order, can discuss the value > >>and > >> >feasibility. In my opinion, data ingest flow is order unawareness, at > >> >least, not such important for us. You can try to verify your proposal > >>and > >> >give us result. It may be some difficulties in keeping transaction with > >> >several Sinks. > >> > > >> >-Regards > >> >Denny Ye > >> > > >> > > >> >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED] > > > >> > > >> >> JIRA is down again? I cannot connect to it and comment there. > >> >> > >> >> I have a proposal in "Transactional Multiplex (fan out) Sink"): > >> >> https://issues.apache.org/jira/browse/FLUME-1435 > >> >> Which contains the design of one channel to multiple sinks. > >> >> > >> >> You can search the email since JIRA cannot be accessed. > >> >> > >> >> I think this is more than a configuration issue. If simply enable > >> >>several > >> >> sinks on the same channel, they will take it either in a round-robin
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-10, 07:53
Hi Denny,
Yes, I agree. I cannot use restrictive policy to commit if the speed is different among sinks. That's why I defined two flexible policies for commit. For e.g. coordinator could commit the transaction if M(0=<M<=N) fast sinks acknowledge success to coordinator. Regards, Yongkun Wang On 12/08/10 15:58, "Denny Ye" <[EMAIL PROTECTED]> wrote: >hi Yongkun, > OK, you have chosen most important baseline with similar consuming >rate >for each Sink. Regularly, it's impossible in fact. Slowest Sink will be >limitation or bottleneck in your design. If my first question is becoming >false case, I think you provide simplified rollback model. Do you agree >me? > >-Regards >Denny Ye > >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> > >> Hi Denny, >> >> Thanks for the questions. Answers inline. >> >> On 12/08/10 15:09, "Denny Ye" <[EMAIL PROTECTED]> wrote: >> >> >Yongkun, >> > Now, I understand your design. Thanks for your interpretation. >> > I have two questions, please help to explain, thanks! >> > 1. Two Sinks have different consuming rate. If Channel have 1000 >> >events, sinkA consumed 800 events and sinkB consumed 100 events. When >>we >> >remove totally consumed events from Channel? >> >> In my design, I try to avoid this case, which means SinkA and SinkB will >> be synchronized and both get 1000 events if the mode is replicating. In >>my >> design, the event is not removed by Sink (call channel.take() in >>process() >> of sink), instead events are removed by high level sink processor, who >> will remove the event once sinks satisfy the transaction requirements. >> >> > 2. Exception happened at one Sink. Each Sink retrieve 100 events >>from >> >Channel, and exception happening at sinkA. sinkA should rollback. >>What's >> >the detailed activity in your thought? >> >> Yes, transaction control on multiple sinks is more complicated. In my >> design, I have two policies to commit a multi-sink transaction (suppose >>we >> have N sinks): >> >> - When M(0=<M<=N) Sinks succeed, commit; e.g. value for M: ANY, ONE, >> QUARUM, ALL >> - When specified M(0<M<=N) Sinks (important sinks) succeed, commit; >> - otherwise, rollback all sinks for current event. >> >> >> Regards, >> Yongkun >> >> > >> >-Regards >> >Denny Ye >> > >> >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> >> > >> >> Hi Denny, >> >> >> >> I am working on the patch now, it's not difficult. I have listed the >> >> changes in that JIRA. >> >> I think you misunderstand my design, I didn't maintain the order of >>the >> >> events. Instead I make sure that each sink will get the same events >>(or >> >> different events specified by selector). >> >> >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 >> >> >> >> If simply enable it by configuration, it may work like this: >> >> Sink "hsa" may get 1,3; >> >> Sink "hsb" may get 2,4; >> >> So different sink will get different data. Is this what user wants? >> >> >> >> >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a >>typical >> >> case when user want to fan-out the data into two places (eg. One for >> >>batch >> >> and and another for real-time analysis). >> >> >> >> Regards, >> >> Yongkun Wang >> >> >> >> >> >> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: >> >> >> >> >hi Yongkun, >> >> > >> >> > JIRA can be accessed now. >> >> > >> >> > I think it might be difficult to understand the order of events >>from >> >> >your thought. If we don't care about the order, can discuss the >>value >> >>and >> >> >feasibility. In my opinion, data ingest flow is order unawareness, >>at >> >> >least, not such important for us. You can try to verify your >>proposal >> >>and >> >> >give us result. It may be some difficulties in keeping transaction >>with >> >> >several Sinks. >> >> > >> >> >-Regards >> >> >Denny Ye >> >> > >> >> > >> >> >2012/8/10 Wang, Yongkun | Yongkun | BDD >><[EMAIL PROTECTED] >> > >> >> > >>
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelJuhani Connolly 2012-08-10, 09:07
Hi Yongkun,
I'm curious why you need to pull the data twice from the sink? Do you need all sinks to have read the same amount of data? Normally for the case of splitting data into batch and analytics, we will send data from the source to two separate channels and have the sinks read from separate channels. On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: > Hi Denny, > > I am working on the patch now, it's not difficult. I have listed the > changes in that JIRA. > I think you misunderstand my design, I didn't maintain the order of the > events. Instead I make sure that each sink will get the same events (or > different events specified by selector). > > Suppose Channel (mc) contains the following events: 4,3,2,1 > > If simply enable it by configuration, it may work like this: > Sink "hsa" may get 1,3; > Sink "hsb" may get 2,4; > So different sink will get different data. Is this what user wants? > > > In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical > case when user want to fan-out the data into two places (eg. One for batch > and and another for real-time analysis). > > Regards, > Yongkun Wang > > > On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: > >> hi Yongkun, >> >> JIRA can be accessed now. >> >> I think it might be difficult to understand the order of events from >> your thought. If we don't care about the order, can discuss the value and >> feasibility. In my opinion, data ingest flow is order unawareness, at >> least, not such important for us. You can try to verify your proposal and >> give us result. It may be some difficulties in keeping transaction with >> several Sinks. >> >> -Regards >> Denny Ye >> >> >> 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> >> >>> JIRA is down again? I cannot connect to it and comment there. >>> >>> I have a proposal in "Transactional Multiplex (fan out) Sink"): >>> https://issues.apache.org/jira/browse/FLUME-1435 >>> Which contains the design of one channel to multiple sinks. >>> >>> You can search the email since JIRA cannot be accessed. >>> >>> I think this is more than a configuration issue. If simply enable >>> several >>> sinks on the same channel, they will take it either in a round-robin >>> mode >>> or in a unpredictable mode if the speed of sinks are different. >>> >>> So it's better to have a even higher level transaction control instead >>> of >>> the transaction in the process() of each sink, as I describe in >>> FLUME-1435. >>> >>> Regards, >>> Yongkun Wang >>> >>> >>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: >>> >>>> Denny Ye created FLUME-1479: >>>> ------------------------------- >>>> >>>> Summary: Multiple Sinks can connect to single Channel >>>> Key: FLUME-1479 >>>> URL: https://issues.apache.org/jira/browse/FLUME-1479 >>>> Project: Flume >>>> Issue Type: Bug >>>> Components: Configuration >>>> Affects Versions: v1.2.0 >>>> Reporter: Denny Ye >>>> Assignee: Denny Ye >>>> Fix For: v1.3.0 >>>> >>>> >>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be >>>> connected with each other with configuration example >>>> {quote} >>>> agent.sinks.hsa.channel = mc >>>> agent.sinks.hsb.channel = mc >>>> {quote} >>>> It means that there have multiple Sinks can connect to single Channel. >>>> Normally, one Sink only can connect to unified Channel >>>> >>>> -- >>>> This message is automatically generated by JIRA. >>>> If you think it was sent incorrectly, please contact your JIRA >>>> administrators: >>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >>>> For more information on JIRA, see: >>> http://www.atlassian.com/software/jira >>>> >>>> >>> >>> > >
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-11, 04:30
Hi Jhhani,
Yes, we can use two (or several) channels to fan out data to different sinks. Then we will have two channels with same data, which may not be an optimized solution. So I want to use just ONE channel, creating a processor to pull the data once from the channel, then distributing to different sinks. Regards, Yongkun Wang On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]> wrote: >Hi Yongkun, > >I'm curious why you need to pull the data twice from the sink? Do you >need all sinks to have read the same amount of data? Normally for the >case of splitting data into batch and analytics, we will send data from >the source to two separate channels and have the sinks read from >separate channels. > >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: >> Hi Denny, >> >> I am working on the patch now, it's not difficult. I have listed the >> changes in that JIRA. >> I think you misunderstand my design, I didn't maintain the order of the >> events. Instead I make sure that each sink will get the same events (or >> different events specified by selector). >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 >> >> If simply enable it by configuration, it may work like this: >> Sink "hsa" may get 1,3; >> Sink "hsb" may get 2,4; >> So different sink will get different data. Is this what user wants? >> >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical >> case when user want to fan-out the data into two places (eg. One for >>batch >> and and another for real-time analysis). >> >> Regards, >> Yongkun Wang >> >> >> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote: >> >>> hi Yongkun, >>> >>> JIRA can be accessed now. >>> >>> I think it might be difficult to understand the order of events from >>> your thought. If we don't care about the order, can discuss the value >>>and >>> feasibility. In my opinion, data ingest flow is order unawareness, at >>> least, not such important for us. You can try to verify your proposal >>>and >>> give us result. It may be some difficulties in keeping transaction with >>> several Sinks. >>> >>> -Regards >>> Denny Ye >>> >>> >>> 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]> >>> >>>> JIRA is down again? I cannot connect to it and comment there. >>>> >>>> I have a proposal in "Transactional Multiplex (fan out) Sink"): >>>> https://issues.apache.org/jira/browse/FLUME-1435 >>>> Which contains the design of one channel to multiple sinks. >>>> >>>> You can search the email since JIRA cannot be accessed. >>>> >>>> I think this is more than a configuration issue. If simply enable >>>> several >>>> sinks on the same channel, they will take it either in a round-robin >>>> mode >>>> or in a unpredictable mode if the speed of sinks are different. >>>> >>>> So it's better to have a even higher level transaction control instead >>>> of >>>> the transaction in the process() of each sink, as I describe in >>>> FLUME-1435. >>>> >>>> Regards, >>>> Yongkun Wang >>>> >>>> >>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote: >>>> >>>>> Denny Ye created FLUME-1479: >>>>> ------------------------------- >>>>> >>>>> Summary: Multiple Sinks can connect to single Channel >>>>> Key: FLUME-1479 >>>>> URL: >>>>>https://issues.apache.org/jira/browse/FLUME-1479 >>>>> Project: Flume >>>>> Issue Type: Bug >>>>> Components: Configuration >>>>> Affects Versions: v1.2.0 >>>>> Reporter: Denny Ye >>>>> Assignee: Denny Ye >>>>> Fix For: v1.3.0 >>>>> >>>>> >>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be >>>>> connected with each other with configuration example >>>>> {quote} >>>>> agent.sinks.hsa.channel = mc >>>>> agent.sinks.hsb.channel = mc >>>>> {quote} >>>>> It means that there have multiple Sinks can connect to single >>>>>Channel. >>>>> Normally, one Sink only can connect to unified Channel
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelMike Percy 2012-08-13, 00:17
Hi,
Due to design decisions made very early on in Flume NG - specifically the fact that Sink only has a simple process() method - I don't see a good way to get multiple sinks pulling from the same channel in a way that is backwards-compatible with the current implementation. Probably the "right" way to support this would be to have an interface where the SinkRunner (or something outside of each Sink) is in control of the transaction, and then it can easily send events to each sink serially or in parallel within a single transaction. I think that is basically what you are describing. If you look at SourceRunner and SourceProcessor you will see similar ideas to what you are describing but they are only implemented at the Source->Channel level. The current SinkProcessor is not an analog of SourceProcessor, but if it was then I think that's where this functionality might fit. However what happens when you do that is you have to handle a ton of failure cases and threading models in a very general way, which might be tough to get right for all use cases. I'm not 100% sure, but I think that's why this was not pursued at the time. To me, this seems like a potential design change (it would have to be very carefully thought out) to consider for a future major Flume code line (maybe a Flume 2.x). By the way, if one is trying to get maximum throughput, then duplicating events onto multiple channels, and having different threads running the sinks (the current design) will be faster and more resilient in general than a single thread and a single channel writing to multiple sinks/destinations. The multiple-channel design pattern will allow periodic downtimes or delays on a single sink to not affect the others, assuming the channel sizes are large enough for buffering during downtime and assuming that each sink is fast enough to recover from temporary delays. Without a dedicated buffer per destination, one is at the mercy of the slowest sink at every stage in the transaction. One last thing worth noting is that the current channels are all well ordered. This means that Flume currently provides a weak ordering guarantee (across a single hop). That is a helpful property in the context of testing and validation, as well as is what many people expect if they are storing logs on a single hop. I hope we don't backpedal on that weak ordering guarantee without a really good reason. Regards, Mike On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < [EMAIL PROTECTED]> wrote: > Hi Jhhani, > > Yes, we can use two (or several) channels to fan out data to different > sinks. Then we will have two channels with same data, which may not be an > optimized solution. So I want to use just ONE channel, creating a > processor to pull the data once from the channel, then distributing to > different sinks. > > Regards, > Yongkun Wang > > On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]> > wrote: > > >Hi Yongkun, > > > >I'm curious why you need to pull the data twice from the sink? Do you > >need all sinks to have read the same amount of data? Normally for the > >case of splitting data into batch and analytics, we will send data from > >the source to two separate channels and have the sinks read from > >separate channels. > > > >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: > >> Hi Denny, > >> > >> I am working on the patch now, it's not difficult. I have listed the > >> changes in that JIRA. > >> I think you misunderstand my design, I didn't maintain the order of the > >> events. Instead I make sure that each sink will get the same events (or > >> different events specified by selector). > >> > >> Suppose Channel (mc) contains the following events: 4,3,2,1 > >> > >> If simply enable it by configuration, it may work like this: > >> Sink "hsa" may get 1,3; > >> Sink "hsb" may get 2,4; > >> So different sink will get different data. Is this what user wants? > >> > >> > >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelWang, Yongkun | Yongkun |... 2012-08-15, 03:03
Thanks Mike.
This is really a nice reply based on the thorough understanding of my proposal. I agree that it might be a potential design change. So I will carefully evaluate it before submitting it to you guys to make the decision. Cheers, Yongkun Wang On 12/08/13 9:17, "Mike Percy" <[EMAIL PROTECTED]> wrote: >Hi, >Due to design decisions made very early on in Flume NG - specifically the >fact that Sink only has a simple process() method - I don't see a good way >to get multiple sinks pulling from the same channel in a way that is >backwards-compatible with the current implementation. > >Probably the "right" way to support this would be to have an interface >where the SinkRunner (or something outside of each Sink) is in control of >the transaction, and then it can easily send events to each sink serially >or in parallel within a single transaction. I think that is basically what >you are describing. If you look at SourceRunner and SourceProcessor you >will see similar ideas to what you are describing but they are only >implemented at the Source->Channel level. The current SinkProcessor is not >an analog of SourceProcessor, but if it was then I think that's where this >functionality might fit. However what happens when you do that is you have >to handle a ton of failure cases and threading models in a very general >way, which might be tough to get right for all use cases. I'm not 100% >sure, but I think that's why this was not pursued at the time. > >To me, this seems like a potential design change (it would have to be very >carefully thought out) to consider for a future major Flume code line >(maybe a Flume 2.x). > >By the way, if one is trying to get maximum throughput, then duplicating >events onto multiple channels, and having different threads running the >sinks (the current design) will be faster and more resilient in general >than a single thread and a single channel writing to multiple >sinks/destinations. The multiple-channel design pattern will allow >periodic >downtimes or delays on a single sink to not affect the others, assuming >the >channel sizes are large enough for buffering during downtime and assuming >that each sink is fast enough to recover from temporary delays. Without a >dedicated buffer per destination, one is at the mercy of the slowest sink >at every stage in the transaction. > >One last thing worth noting is that the current channels are all well >ordered. This means that Flume currently provides a weak ordering >guarantee >(across a single hop). That is a helpful property in the context of >testing >and validation, as well as is what many people expect if they are storing >logs on a single hop. I hope we don't backpedal on that weak ordering >guarantee without a really good reason. > >Regards, >Mike > >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < >[EMAIL PROTECTED]> wrote: > >> Hi Jhhani, >> >> Yes, we can use two (or several) channels to fan out data to different >> sinks. Then we will have two channels with same data, which may not be >>an >> optimized solution. So I want to use just ONE channel, creating a >> processor to pull the data once from the channel, then distributing to >> different sinks. >> >> Regards, >> Yongkun Wang >> >> On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]> >> wrote: >> >> >Hi Yongkun, >> > >> >I'm curious why you need to pull the data twice from the sink? Do you >> >need all sinks to have read the same amount of data? Normally for the >> >case of splitting data into batch and analytics, we will send data from >> >the source to two separate channels and have the sinks read from >> >separate channels. >> > >> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: >> >> Hi Denny, >> >> >> >> I am working on the patch now, it's not difficult. I have listed the >> >> changes in that JIRA. >> >> I think you misunderstand my design, I didn't maintain the order of >>the >> >> events. Instead I make sure that each sink will get the same events
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single ChannelMike Percy 2012-08-15, 03:52
Yongkun Wang,
You're welcome! Very happy to hear your thoughts. Regards, Mike On Tue, Aug 14, 2012 at 8:03 PM, Wang, Yongkun | Yongkun | BDD < [EMAIL PROTECTED]> wrote: > Thanks Mike. > > This is really a nice reply based on the thorough understanding of my > proposal. > > I agree that it might be a potential design change. So I will carefully > evaluate it before submitting it to you guys to make the decision. > > Cheers, > Yongkun Wang > > On 12/08/13 9:17, "Mike Percy" <[EMAIL PROTECTED]> wrote: > > >Hi, > >Due to design decisions made very early on in Flume NG - specifically the > >fact that Sink only has a simple process() method - I don't see a good way > >to get multiple sinks pulling from the same channel in a way that is > >backwards-compatible with the current implementation. > > > >Probably the "right" way to support this would be to have an interface > >where the SinkRunner (or something outside of each Sink) is in control of > >the transaction, and then it can easily send events to each sink serially > >or in parallel within a single transaction. I think that is basically what > >you are describing. If you look at SourceRunner and SourceProcessor you > >will see similar ideas to what you are describing but they are only > >implemented at the Source->Channel level. The current SinkProcessor is not > >an analog of SourceProcessor, but if it was then I think that's where this > >functionality might fit. However what happens when you do that is you have > >to handle a ton of failure cases and threading models in a very general > >way, which might be tough to get right for all use cases. I'm not 100% > >sure, but I think that's why this was not pursued at the time. > > > >To me, this seems like a potential design change (it would have to be very > >carefully thought out) to consider for a future major Flume code line > >(maybe a Flume 2.x). > > > >By the way, if one is trying to get maximum throughput, then duplicating > >events onto multiple channels, and having different threads running the > >sinks (the current design) will be faster and more resilient in general > >than a single thread and a single channel writing to multiple > >sinks/destinations. The multiple-channel design pattern will allow > >periodic > >downtimes or delays on a single sink to not affect the others, assuming > >the > >channel sizes are large enough for buffering during downtime and assuming > >that each sink is fast enough to recover from temporary delays. Without a > >dedicated buffer per destination, one is at the mercy of the slowest sink > >at every stage in the transaction. > > > >One last thing worth noting is that the current channels are all well > >ordered. This means that Flume currently provides a weak ordering > >guarantee > >(across a single hop). That is a helpful property in the context of > >testing > >and validation, as well as is what many people expect if they are storing > >logs on a single hop. I hope we don't backpedal on that weak ordering > >guarantee without a really good reason. > > > >Regards, > >Mike > > > >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < > >[EMAIL PROTECTED]> wrote: > > > >> Hi Jhhani, > >> > >> Yes, we can use two (or several) channels to fan out data to different > >> sinks. Then we will have two channels with same data, which may not be > >>an > >> optimized solution. So I want to use just ONE channel, creating a > >> processor to pull the data once from the channel, then distributing to > >> different sinks. > >> > >> Regards, > >> Yongkun Wang > >> > >> On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]> > >> wrote: > >> > >> >Hi Yongkun, > >> > > >> >I'm curious why you need to pull the data twice from the sink? Do you > >> >need all sinks to have read the same amount of data? Normally for the > >> >case of splitting data into batch and analytics, we will send data from > >> >the source to two separate channels and have the sinks read from |