Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel


Copy link to this message
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel
hi Yongkun,
    OK, you have chosen most important baseline with similar consuming rate
for each Sink. Regularly, it's impossible in fact. Slowest Sink will be
limitation or bottleneck in your design. If my first question is becoming
false case, I think you provide simplified rollback model. Do you agree me?

-Regards
Denny Ye

2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]>

> Hi Denny,
>
> Thanks for the questions. Answers inline.
>
> On 12/08/10 15:09, "Denny Ye" <[EMAIL PROTECTED]> wrote:
>
> >Yongkun,
> >    Now, I understand your design. Thanks for your interpretation.
> >    I have two questions, please help to explain, thanks!
> >    1. Two Sinks have different consuming rate. If Channel have 1000
> >events, sinkA consumed 800 events and sinkB consumed 100 events. When we
> >remove totally consumed events from Channel?
>
> In my design, I try to avoid this case, which means SinkA and SinkB will
> be synchronized and both get 1000 events if the mode is replicating. In my
> design, the event is not removed by Sink (call channel.take() in process()
> of sink), instead events are removed by high level sink processor, who
> will remove the event once sinks satisfy the transaction requirements.
>
> >    2. Exception happened at one Sink. Each Sink retrieve 100 events from
> >Channel, and exception happening at sinkA. sinkA should rollback. What's
> >the detailed activity in your thought?
>
> Yes, transaction control on multiple sinks is more complicated. In my
> design, I have two policies to commit a multi-sink transaction (suppose we
> have N sinks):
>
> - When M(0=<M<=N) Sinks succeed, commit; e.g. value for M: ANY, ONE,
> QUARUM, ALL
> - When specified M(0<M<=N) Sinks (important sinks) succeed, commit;
> - otherwise, rollback all sinks for current event.
>
>
> Regards,
> Yongkun
>
> >
> >-Regards
> >Denny Ye
> >
> >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]>
> >
> >> Hi Denny,
> >>
> >> I am working on the patch now, it's not difficult. I have listed the
> >> changes in that JIRA.
> >> I think you misunderstand my design, I didn't maintain the order of the
> >> events. Instead I make sure that each sink will get the same events (or
> >> different events specified by selector).
> >>
> >> Suppose Channel (mc) contains the following events: 4,3,2,1
> >>
> >> If simply enable it by configuration, it may work like this:
> >> Sink "hsa" may get 1,3;
> >> Sink "hsb" may get 2,4;
> >> So different sink will get different data. Is this what user wants?
> >>
> >>
> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical
> >> case when user want to fan-out the data into two places (eg. One for
> >>batch
> >> and and another for real-time analysis).
> >>
> >> Regards,
> >> Yongkun Wang
> >>
> >>
> >> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote:
> >>
> >> >hi Yongkun,
> >> >
> >> >   JIRA can be accessed now.
> >> >
> >> >   I think it might be difficult to understand the order of events from
> >> >your thought. If we don't care about the order, can discuss the value
> >>and
> >> >feasibility.  In my opinion, data ingest flow is order unawareness, at
> >> >least, not such important for us. You can try to verify your proposal
> >>and
> >> >give us result. It may be some difficulties in keeping transaction with
> >> >several Sinks.
> >> >
> >> >-Regards
> >> >Denny Ye
> >> >
> >> >
> >> >2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]
> >
> >> >
> >> >> JIRA is down again? I cannot connect to it and comment there.
> >> >>
> >> >> I have a proposal in "Transactional Multiplex (fan out) Sink"):
> >> >> https://issues.apache.org/jira/browse/FLUME-1435
> >> >> Which contains the design of one channel to multiple sinks.
> >> >>
> >> >> You can search the email since JIRA cannot be accessed.
> >> >>
> >> >> I think this is more than a configuration issue. If simply enable
> >> >>several
> >> >> sinks on the same channel, they will take it either in a round-robin