-Re: ExecSource->MemoryChannel->AvroSink->AvroSource->FileChannel->HDFSSink throughput question
Chris Neal 2013-02-01, 16:40
Thanks for the help Juhani :) I'll take a look with Ganglia and see what
things look like.
Any thoughts on keeping the ExecSource.batchSize,
MemoryChannel.transactionCapacity, AvroSink.batch-size, and
HDFSSink.batchSize the same?
I looked at the MemoryChannel code, and noticed that there is a timeout
parameter passed to doCommit(), where the execption is being thrown. Just
for fun, I increased it from the default to 10 seconds, and now things are
running smoothly with the same config as before. It's been running for
about 24 hours now. A step in the right direction anyway! :)
On Thu, Jan 31, 2013 at 8:12 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:
> Hi Chris,
> The most likely cause of that error is that the sinks are draining
> requests slower than your sources are feeding fresh data. Over time it will
> fill up the capacity of your memory channel, which will then start refusing
> additional put requests.
> You can confirm this by connecting with jmx or ganglia.
> If the write is extremely bursty, it's possible that it's just temporarily
> going over the sink consumption rate, and increasing the channel capacity
> could work. Otherwise, increasing the avro batch size, or adding additional
> avro sinks(more threads) may also help. I think that setting up ganglia
> monitoring and looking at the incoming and outgoing event counts and
> channel fill states helps a lot in diagnosing these bottlenecks, you should
> look into doing that.
> On 02/01/2013 02:01 AM, Chris Neal wrote:
> Hi all.
> I need some thoughts on sizing/tuning of the above (common) route in
> FlumeNG to maximize throughput. Here is my setup:
> *Source JVM (ExecSource/MemoryChannel/AvroSink):*
> Number of ExecSources in config: 124 (yes, it's a ton. Can't do
> anything about it :) The write rate to the source files is fairly fast and
> ExecSource.batchSize = 1000
> (so, when all 124 tail -F instances get 1000 events, they all dump to the
> memory channel)
> MemoryChannel.capacity = 1000000
> MemoryChannel.transactionCapacity = 1000
> (somewhat unclear on what this is. Docs say "The number of events stored
> in the channel per transaction", but what is a "transaction" to a
> AvroSink.batchSize = 1000
> *Destination JVM (AvroSource/FileChannel/HDFSSink)*
> (Cluster of two JVMs on two servers, each configured the same as per below)
> -XX:MaxDirectMemorySize is not defined, so whatever the default is
> AvroSource.threads = 64
> FileChannel.transactionCapacity = 1000
> FileChannel.capacity = 32000000
> HDFSSink.batchSize = 1000
> HDFSSink.threadPoolSize = 64
> With this configuration, in about 5 minutes, I get the common Exception:
> "Space for commit to queue couldn't be acquired Sinks are likely not
> keeping up with sources, or the buffer size is too tight"
> on the Source JVM. It is no where near the 4g max, rather only at about
> I'm wondering about the logic of having all the batch sizes/transaction
> sizes 1000. My thought was that would keep from fragmenting the transfer
> of data, but maybe that's flawed? Should the sizes be different?
> Also curious about increasing the MaxDirectMemorySize to something
> larger than 256MB? I tried removing it altogether in my Source JVM (which
> makes the size unbounded), but that didn't seem to make a difference.
> I'm having some trouble figuring out where the backup is happening, and
> how to open up the gates. :)
> Thanks in advance for any suggestions.