Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Best way to increase throughput of Exec->Memory->Avro agent.


+
Chris Neal 2013-03-12, 20:43
+
Roshan Naik 2013-03-12, 21:10
Copy link to this message
-
Re: Best way to increase throughput of Exec->Memory->Avro agent.
Roshan Naik 2013-03-12, 21:12
i meant 640,000 not 64,000

On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[EMAIL PROTECTED]> wrote:
> beyond a certain # of sinks it wont help adding more. my suspicion is
> you may have gone way overboard.
>
>  if your sink-side batch size is that large and you have 64 sinks in
> the round-robin.. it will take a lot of events (64,000) to be pumped
> in by the source order before the first event can start trickling out
> of any sink.  Also memory consumption will be quite high.. each sink
> will open a transaction and hold on to 10000 events. This the cause
> for the Memory channel filling up. Until the sink side transaction is
> committed (i.e 10k events are pulled), the memory reservation on the
> channel is not relinquished. So your memory channel size will have to
> really high to support so manch sinks each with such a big batch size.
>
> My gut feel is that your source-side batch size is not much of an
> issue and can be smaller. Increasing the number of sinks will only
> help if the sink is indeed the bott
>
> On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
>> Hi all.
>>
>> I've been working on this for quite some time, and need some advice from the
>> experts.  I have a two tiered Flume architecture:
>>
>> App Tier (all on one server):
>>  124 ExecSources -> MemoryChannel -> AvroSinks
>>
>> HDFS Tier (on two servers):
>>   AvroSource -> FileChannel -> HDFSSinks
>>
>> When I run the agents, the HDFS tier is keeping up fine with the App Tier.
>> queue sizes stay between 0-10000 (I have a batch size of 10000).  All is
>> good.
>>
>> On the App Tier, when I view the JMX data through jconsole, I watch the size
>> of the MemoryChannel grow steadily until it reaches the max, then it starts
>> throwing exceptions about not being able to put the batch on the channel as
>> expected.
>>
>> There seems to be two basic ways to increase the throughput of the App Tier:
>> 1.  Increase the MemoryChannel's transactionCapacity and the corresponding
>> AvroSink's batch-size.  Both are set to 10000 for me.
>> 2.  Increase the number of AvroSinks to drain the MemoryChannel.  I'm up to
>> 64 Sinks now which round-robin between the two Flume Agents on the HDFS
>> tier.
>>
>> Both of those values seem quite high to me (batch size and number of sinks).
>>
>> Am I missing something as far as tuning?
>> Which would allow for greater increase to throughput, more Sinks or larger
>> batch size?
>>
>> I'm stumped here.  I still think I can get this to work. :)
>>
>> Any suggestions are most welcome.
>> Thanks for your time.
>> Chris
>>
+
Chris Neal 2013-03-12, 21:24
+
Roshan Naik 2013-03-12, 21:35
+
Chris Neal 2013-03-12, 21:40
+
Chris Neal 2013-03-12, 21:55
+
Roshan Naik 2013-03-12, 22:37