Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> AvroSink and LoadBalancingRpcClient


+
Denny Ye 2013-01-10, 06:55
+
Connor Woodson 2013-01-10, 07:05
+
Hari Shreedharan 2013-01-10, 07:30
+
Denny Ye 2013-01-10, 07:55
Copy link to this message
-
Re: AvroSink and LoadBalancingRpcClient
Forgot about sink processors; yes, it will work.

The trick of this method is you will use a different sink for each
endpoint, where as the RpcClient (when exposed) will do it all in itself.
Your configuration will need to look something like this:

-----------------

<sources>

a1.channels = c1
<channel setup>

a1.sinks = k1 k2

a1.sinks.k1.type = AVRO
< set up centralFlumeE connection >
a1.sinks.k1.channel = c1

a1.sinks.k2.type = AVRO
< set up centralFlumeF connection >
a1.sinks.k2.channel = c1

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin

-----------------

here is the relevant link for the load balancing processor:
http://flume.apache.org/FlumeUserGuide.html#load-balancing-sink-processor

Remember that all sinks in a sink group must share the same channel. This
is load balancing, which is what you are seeking in your scenario; the load
balancer is not for failover (in the setup of primary and backup servers),
although there is a FailoverSinkProcessor for if that's needed.

- Connor
On Wed, Jan 9, 2013 at 11:55 PM, Denny Ye <[EMAIL PROTECTED]> wrote:

> hi Hari,
>     I cannot judge the situation that using method you raised. I would
> like to explain my case and need your comments. Thanks a lot!
>     What I need is load balancing while event transferring.  Assume that I
> have single local Flume server (located with application) named
> 'localFlumeA', configured with single AvroSink and Channel. Meanwhile, two
> central Flume servers (collectors) named 'centralFlumeE' and
> 'centralFlumeF'. Under this case, I would like to configure load balancing
> between 'centralFlumeE' and 'centralFlumeF' for events coming from
> 'localFlumeA', and load can be dispatched averagely for that two central
> Flume servers.
>     Can it be configured by LoadBalancingSinkProcessor in your mind? Wish
> your advice
>
> -Regards
> Denny Ye
>
>
> 2013/1/10 Hari Shreedharan <[EMAIL PROTECTED]>
>
>>  The LoadBalancing capability similar to the LoadBalancingRpcClient can
>> be configured for multiple Avro Sinks using a LoadBalancingSinkProcessor,
>> if you are looking for that functionality.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Wednesday, January 9, 2013 at 11:05 PM, Connor Woodson wrote:
>>
>> Short answer: there is no way in the current AvroSink to configure the
>> RpcClient, limiting you to just a single host connection (I'm not sure how
>> well it recovers if that host goes down).
>>
>> The AvroSink is incredibly simplified from what the RPCClient can do and
>> exposes none of the background functionality. Right now, the only way
>> around that is to create a custom sink based off of the AvroSink source
>> code and instead of setting the RPCClient up the way it currently is, you
>> pass into the RPCClient.getInstance() a set of user supplied properties. To
>> implement this in an unsafe way (not checking any of the user's values)
>> would only take a couple lines of code I believe. It is a work around, but
>> it will enable all of the various RPCClient capabilities such as failover
>> or loadbalancing mode and allow it to connect to multiple hosts.
>>
>> This is something that (I think) there is a JIRA filed for; but if not,
>> it would be very helpful for this to be implemented into the actual
>> AvroSink (and something that should be linked to that is
>> RPCClient.getInstance accepting a Context object, simply for ease of use).
>>
>> - Connor
>>
>>
>> On Wed, Jan 9, 2013 at 10:55 PM, Denny Ye <[EMAIL PROTECTED]> wrote:
>>
>> hi all,
>>     I didn't find the relationship between AvroSink and other types of
>> RpcClient, including LoadBalancingRpcClient. In my opinion, user can set
>> the specified RpcClient type from AvroSink with several strategies and host
>> selectors. Also, I cannot get information from source code and user guide.
>> Did I miss something about this?
>>      Wish someone can support, thanks!
+
Hari Shreedharan 2013-01-10, 08:33
+
Denny Ye 2013-01-10, 09:58
+
Connor Woodson 2013-01-10, 19:33