Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> AvroSink and LoadBalancingRpcClient


Copy link to this message
-
Re: AvroSink and LoadBalancingRpcClient
Forgot about sink processors; yes, it will work.

The trick of this method is you will use a different sink for each
endpoint, where as the RpcClient (when exposed) will do it all in itself.
Your configuration will need to look something like this:

-----------------

<sources>

a1.channels = c1
<channel setup>

a1.sinks = k1 k2

a1.sinks.k1.type = AVRO
< set up centralFlumeE connection >
a1.sinks.k1.channel = c1

a1.sinks.k2.type = AVRO
< set up centralFlumeF connection >
a1.sinks.k2.channel = c1

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin

-----------------

here is the relevant link for the load balancing processor:
http://flume.apache.org/FlumeUserGuide.html#load-balancing-sink-processor

Remember that all sinks in a sink group must share the same channel. This
is load balancing, which is what you are seeking in your scenario; the load
balancer is not for failover (in the setup of primary and backup servers),
although there is a FailoverSinkProcessor for if that's needed.

- Connor
On Wed, Jan 9, 2013 at 11:55 PM, Denny Ye <[EMAIL PROTECTED]> wrote:

> hi Hari,
>     I cannot judge the situation that using method you raised. I would
> like to explain my case and need your comments. Thanks a lot!
>     What I need is load balancing while event transferring.  Assume that I
> have single local Flume server (located with application) named
> 'localFlumeA', configured with single AvroSink and Channel. Meanwhile, two
> central Flume servers (collectors) named 'centralFlumeE' and
> 'centralFlumeF'. Under this case, I would like to configure load balancing
> between 'centralFlumeE' and 'centralFlumeF' for events coming from
> 'localFlumeA', and load can be dispatched averagely for that two central
> Flume servers.
>     Can it be configured by LoadBalancingSinkProcessor in your mind? Wish
> your advice
>
> -Regards
> Denny Ye
>
>
> 2013/1/10 Hari Shreedharan <[EMAIL PROTECTED]>
>
>>  The LoadBalancing capability similar to the LoadBalancingRpcClient can
>> be configured for multiple Avro Sinks using a LoadBalancingSinkProcessor,
>> if you are looking for that functionality.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Wednesday, January 9, 2013 at 11:05 PM, Connor Woodson wrote:
>>
>> Short answer: there is no way in the current AvroSink to configure the
>> RpcClient, limiting you to just a single host connection (I'm not sure how
>> well it recovers if that host goes down).
>>
>> The AvroSink is incredibly simplified from what the RPCClient can do and
>> exposes none of the background functionality. Right now, the only way
>> around that is to create a custom sink based off of the AvroSink source
>> code and instead of setting the RPCClient up the way it currently is, you
>> pass into the RPCClient.getInstance() a set of user supplied properties. To
>> implement this in an unsafe way (not checking any of the user's values)
>> would only take a couple lines of code I believe. It is a work around, but
>> it will enable all of the various RPCClient capabilities such as failover
>> or loadbalancing mode and allow it to connect to multiple hosts.
>>
>> This is something that (I think) there is a JIRA filed for; but if not,
>> it would be very helpful for this to be implemented into the actual
>> AvroSink (and something that should be linked to that is
>> RPCClient.getInstance accepting a Context object, simply for ease of use).
>>
>> - Connor
>>
>>
>> On Wed, Jan 9, 2013 at 10:55 PM, Denny Ye <[EMAIL PROTECTED]> wrote:
>>
>> hi all,
>>     I didn't find the relationship between AvroSink and other types of
>> RpcClient, including LoadBalancingRpcClient. In my opinion, user can set
>> the specified RpcClient type from AvroSink with several strategies and host
>> selectors. Also, I cannot get information from source code and user guide.
>> Did I miss something about this?
>>      Wish someone can support, thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB