Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Best way to increase throughput of Exec->Memory->Avro agent.


+
Chris Neal 2013-03-12, 20:43
Copy link to this message
-
Re: Best way to increase throughput of Exec->Memory->Avro agent.
Roshan Naik 2013-03-12, 21:10
beyond a certain # of sinks it wont help adding more. my suspicion is
you may have gone way overboard.

 if your sink-side batch size is that large and you have 64 sinks in
the round-robin.. it will take a lot of events (64,000) to be pumped
in by the source order before the first event can start trickling out
of any sink.  Also memory consumption will be quite high.. each sink
will open a transaction and hold on to 10000 events. This the cause
for the Memory channel filling up. Until the sink side transaction is
committed (i.e 10k events are pulled), the memory reservation on the
channel is not relinquished. So your memory channel size will have to
really high to support so manch sinks each with such a big batch size.

My gut feel is that your source-side batch size is not much of an
issue and can be smaller. Increasing the number of sinks will only
help if the sink is indeed the bott

On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[EMAIL PROTECTED]> wrote:
> Hi all.
>
> I've been working on this for quite some time, and need some advice from the
> experts.  I have a two tiered Flume architecture:
>
> App Tier (all on one server):
>  124 ExecSources -> MemoryChannel -> AvroSinks
>
> HDFS Tier (on two servers):
>   AvroSource -> FileChannel -> HDFSSinks
>
> When I run the agents, the HDFS tier is keeping up fine with the App Tier.
> queue sizes stay between 0-10000 (I have a batch size of 10000).  All is
> good.
>
> On the App Tier, when I view the JMX data through jconsole, I watch the size
> of the MemoryChannel grow steadily until it reaches the max, then it starts
> throwing exceptions about not being able to put the batch on the channel as
> expected.
>
> There seems to be two basic ways to increase the throughput of the App Tier:
> 1.  Increase the MemoryChannel's transactionCapacity and the corresponding
> AvroSink's batch-size.  Both are set to 10000 for me.
> 2.  Increase the number of AvroSinks to drain the MemoryChannel.  I'm up to
> 64 Sinks now which round-robin between the two Flume Agents on the HDFS
> tier.
>
> Both of those values seem quite high to me (batch size and number of sinks).
>
> Am I missing something as far as tuning?
> Which would allow for greater increase to throughput, more Sinks or larger
> batch size?
>
> I'm stumped here.  I still think I can get this to work. :)
>
> Any suggestions are most welcome.
> Thanks for your time.
> Chris
>
+
Roshan Naik 2013-03-12, 21:12
+
Chris Neal 2013-03-12, 21:24
+
Roshan Naik 2013-03-12, 21:35
+
Chris Neal 2013-03-12, 21:40
+
Chris Neal 2013-03-12, 21:55
+
Roshan Naik 2013-03-12, 22:37