Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel


Copy link to this message
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel
Wang, Yongkun | Yongkun |... 2012-08-11, 04:30
Hi Jhhani,

Yes, we can use two (or several) channels to fan out data to different
sinks. Then we will have two channels with same data, which may not be an
optimized solution. So I want to use just ONE channel, creating a
processor to pull the data once from the channel, then distributing to
different sinks.

Regards,
Yongkun Wang

On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]>
wrote:

>Hi Yongkun,
>
>I'm curious why you need to pull the data twice from the sink? Do you
>need all sinks to have read the same amount of data? Normally for the
>case of splitting data into batch and analytics, we will send data from
>the source to two separate channels and have the sinks read from
>separate channels.
>
>On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote:
>> Hi Denny,
>>
>> I am working on the patch now, it's not difficult. I have listed the
>> changes in that JIRA.
>> I think you misunderstand my design, I didn't maintain the order of the
>> events. Instead I make sure that each sink will get the same events (or
>> different events specified by selector).
>>
>> Suppose Channel (mc) contains the following events: 4,3,2,1
>>
>> If simply enable it by configuration, it may work like this:
>> Sink "hsa" may get 1,3;
>> Sink "hsb" may get 2,4;
>> So different sink will get different data. Is this what user wants?
>>
>>
>> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a typical
>> case when user want to fan-out the data into two places (eg. One for
>>batch
>> and and another for real-time analysis).
>>
>> Regards,
>> Yongkun Wang
>>
>>
>> On 12/08/10 14:29, "Denny Ye" <[EMAIL PROTECTED]> wrote:
>>
>>> hi Yongkun,
>>>
>>>    JIRA can be accessed now.
>>>
>>>    I think it might be difficult to understand the order of events from
>>> your thought. If we don't care about the order, can discuss the value
>>>and
>>> feasibility.  In my opinion, data ingest flow is order unawareness, at
>>> least, not such important for us. You can try to verify your proposal
>>>and
>>> give us result. It may be some difficulties in keeping transaction with
>>> several Sinks.
>>>
>>> -Regards
>>> Denny Ye
>>>
>>>
>>> 2012/8/10 Wang, Yongkun | Yongkun | BDD <[EMAIL PROTECTED]>
>>>
>>>> JIRA is down again? I cannot connect to it and comment there.
>>>>
>>>> I have a proposal in "Transactional Multiplex (fan out) Sink"):
>>>> https://issues.apache.org/jira/browse/FLUME-1435
>>>> Which contains the design of one channel to multiple sinks.
>>>>
>>>> You can search the email since JIRA cannot be accessed.
>>>>
>>>> I think this is more than a configuration issue. If simply enable
>>>> several
>>>> sinks on the same channel, they will take it either in a round-robin
>>>> mode
>>>> or in a unpredictable mode if the speed of sinks are different.
>>>>
>>>> So it's better to have a even higher level transaction control instead
>>>> of
>>>> the transaction in the process() of each sink, as I describe in
>>>> FLUME-1435.
>>>>
>>>> Regards,
>>>> Yongkun Wang
>>>>
>>>>
>>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Denny Ye created FLUME-1479:
>>>>> -------------------------------
>>>>>
>>>>>              Summary: Multiple Sinks can connect to single Channel
>>>>>                  Key: FLUME-1479
>>>>>                  URL:
>>>>>https://issues.apache.org/jira/browse/FLUME-1479
>>>>>              Project: Flume
>>>>>           Issue Type: Bug
>>>>>           Components: Configuration
>>>>>     Affects Versions: v1.2.0
>>>>>             Reporter: Denny Ye
>>>>>             Assignee: Denny Ye
>>>>>              Fix For: v1.3.0
>>>>>
>>>>>
>>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they may be
>>>>> connected with each other with configuration example
>>>>> {quote}
>>>>> agent.sinks.hsa.channel = mc
>>>>> agent.sinks.hsb.channel = mc
>>>>> {quote}
>>>>> It means that there have multiple Sinks can connect to single
>>>>>Channel.
>>>>> Normally, one Sink only can connect to unified Channel