Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel


Copy link to this message
-
Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel
Thanks Mike.

This is really a nice reply based on the thorough understanding of my
proposal.

I agree that it might be a potential design change. So I will carefully
evaluate it before submitting it to you guys to make the decision.

Cheers,
Yongkun Wang

On 12/08/13 9:17, "Mike Percy" <[EMAIL PROTECTED]> wrote:

>Hi,
>Due to design decisions made very early on in Flume NG - specifically the
>fact that Sink only has a simple process() method - I don't see a good way
>to get multiple sinks pulling from the same channel in a way that is
>backwards-compatible with the current implementation.
>
>Probably the "right" way to support this would be to have an interface
>where the SinkRunner (or something outside of each Sink) is in control of
>the transaction, and then it can easily send events to each sink serially
>or in parallel within a single transaction. I think that is basically what
>you are describing. If you look at SourceRunner and SourceProcessor you
>will see similar ideas to what you are describing but they are only
>implemented at the Source->Channel level. The current SinkProcessor is not
>an analog of SourceProcessor, but if it was then I think that's where this
>functionality might fit. However what happens when you do that is you have
>to handle a ton of failure cases and threading models in a very general
>way, which might be tough to get right for all use cases. I'm not 100%
>sure, but I think that's why this was not pursued at the time.
>
>To me, this seems like a potential design change (it would have to be very
>carefully thought out) to consider for a future major Flume code line
>(maybe a Flume 2.x).
>
>By the way, if one is trying to get maximum throughput, then duplicating
>events onto multiple channels, and having different threads running the
>sinks (the current design) will be faster and more resilient in general
>than a single thread and a single channel writing to multiple
>sinks/destinations. The multiple-channel design pattern will allow
>periodic
>downtimes or delays on a single sink to not affect the others, assuming
>the
>channel sizes are large enough for buffering during downtime and assuming
>that each sink is fast enough to recover from temporary delays. Without a
>dedicated buffer per destination, one is at the mercy of the slowest sink
>at every stage in the transaction.
>
>One last thing worth noting is that the current channels are all well
>ordered. This means that Flume currently provides a weak ordering
>guarantee
>(across a single hop). That is a helpful property in the context of
>testing
>and validation, as well as is what many people expect if they are storing
>logs on a single hop. I hope we don't backpedal on that weak ordering
>guarantee without a really good reason.
>
>Regards,
>Mike
>
>On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD <
>[EMAIL PROTECTED]> wrote:
>
>> Hi Jhhani,
>>
>> Yes, we can use two (or several) channels to fan out data to different
>> sinks. Then we will have two channels with same data, which may not be
>>an
>> optimized solution. So I want to use just ONE channel, creating a
>> processor to pull the data once from the channel, then distributing to
>> different sinks.
>>
>> Regards,
>> Yongkun Wang
>>
>> On 12/08/10 18:07, "Juhani Connolly" <[EMAIL PROTECTED]>
>> wrote:
>>
>> >Hi Yongkun,
>> >
>> >I'm curious why you need to pull the data twice from the sink? Do you
>> >need all sinks to have read the same amount of data? Normally for the
>> >case of splitting data into batch and analytics, we will send data from
>> >the source to two separate channels and have the sinks read from
>> >separate channels.
>> >
>> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote:
>> >> Hi Denny,
>> >>
>> >> I am working on the patch now, it's not difficult. I have listed the
>> >> changes in that JIRA.
>> >> I think you misunderstand my design, I didn't maintain the order of
>>the
>> >> events. Instead I make sure that each sink will get the same events
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB