Chris Neal 2013-03-12, 20:43
Roshan Naik 2013-03-12, 21:10
i meant 640,000 not 64,000
On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[EMAIL PROTECTED]> wrote:
> beyond a certain # of sinks it wont help adding more. my suspicion is
> you may have gone way overboard.
> if your sink-side batch size is that large and you have 64 sinks in
> the round-robin.. it will take a lot of events (64,000) to be pumped
> in by the source order before the first event can start trickling out
> of any sink. Also memory consumption will be quite high.. each sink
> will open a transaction and hold on to 10000 events. This the cause
> for the Memory channel filling up. Until the sink side transaction is
> committed (i.e 10k events are pulled), the memory reservation on the
> channel is not relinquished. So your memory channel size will have to
> really high to support so manch sinks each with such a big batch size.
> My gut feel is that your source-side batch size is not much of an
> issue and can be smaller. Increasing the number of sinks will only
> help if the sink is indeed the bott
> On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
>> Hi all.
>> I've been working on this for quite some time, and need some advice from the
>> experts. I have a two tiered Flume architecture:
>> App Tier (all on one server):
>> 124 ExecSources -> MemoryChannel -> AvroSinks
>> HDFS Tier (on two servers):
>> AvroSource -> FileChannel -> HDFSSinks
>> When I run the agents, the HDFS tier is keeping up fine with the App Tier.
>> queue sizes stay between 0-10000 (I have a batch size of 10000). All is
>> On the App Tier, when I view the JMX data through jconsole, I watch the size
>> of the MemoryChannel grow steadily until it reaches the max, then it starts
>> throwing exceptions about not being able to put the batch on the channel as
>> There seems to be two basic ways to increase the throughput of the App Tier:
>> 1. Increase the MemoryChannel's transactionCapacity and the corresponding
>> AvroSink's batch-size. Both are set to 10000 for me.
>> 2. Increase the number of AvroSinks to drain the MemoryChannel. I'm up to
>> 64 Sinks now which round-robin between the two Flume Agents on the HDFS
>> Both of those values seem quite high to me (batch size and number of sinks).
>> Am I missing something as far as tuning?
>> Which would allow for greater increase to throughput, more Sinks or larger
>> batch size?
>> I'm stumped here. I still think I can get this to work. :)
>> Any suggestions are most welcome.
>> Thanks for your time.
Chris Neal 2013-03-12, 21:24
Roshan Naik 2013-03-12, 21:35
Chris Neal 2013-03-12, 21:40
Chris Neal 2013-03-12, 21:55
Roshan Naik 2013-03-12, 22:37