-Re: ExecSource->MemoryChannel->AvroSink->AvroSource->FileChannel->HDFSSink throughput question
Chris Neal 2013-02-05, 15:57
Again, thank you so much for your time. :)
The timeout increase bought be some time, but it still ended up with the
Exception. I love the multiple sinks idea...I should have thought of that
On Mon, Feb 4, 2013 at 8:22 PM, Juhani Connolly <
[EMAIL PROTECTED]> wrote:
> On 02/02/2013 01:40 AM, Chris Neal wrote:
> Thanks for the help Juhani :) I'll take a look with Ganglia and see what
> things look like.
> Any thoughts on keeping the ExecSource.batchSize,
> MemoryChannel.transactionCapacity, AvroSink.batch-size, and
> HDFSSink.batchSize the same?
> It's not really important, so long as the avro batch size is less than
> or equal to the channel transaction capacity. The HDFS sinks batch size is
> independent of them both.
> I looked at the MemoryChannel code, and noticed that there is a timeout
> parameter passed to doCommit(), where the execption is being thrown. Just
> for fun, I increased it from the default to 10 seconds, and now things are
> running smoothly with the same config as before. It's been running for
> about 24 hours now. A step in the right direction anyway! :)
> If that fixed it, it sounds like your data is just very bursty and
> sometimes gets fed in faster than it's drained out. The solution to that
> would be either to enlarge your temporary buffer(the mem channel), to
> throttle the incoming data(probably not possible) or to increase drain
> speed(more sinks running in parallel)
> Thanks again.
> On Thu, Jan 31, 2013 at 8:12 PM, Juhani Connolly <
> [EMAIL PROTECTED]> wrote:
>> Hi Chris,
>> The most likely cause of that error is that the sinks are draining
>> requests slower than your sources are feeding fresh data. Over time it will
>> fill up the capacity of your memory channel, which will then start refusing
>> additional put requests.
>> You can confirm this by connecting with jmx or ganglia.
>> If the write is extremely bursty, it's possible that it's just
>> temporarily going over the sink consumption rate, and increasing the
>> channel capacity could work. Otherwise, increasing the avro batch size, or
>> adding additional avro sinks(more threads) may also help. I think that
>> setting up ganglia monitoring and looking at the incoming and outgoing
>> event counts and channel fill states helps a lot in diagnosing these
>> bottlenecks, you should look into doing that.
>> On 02/01/2013 02:01 AM, Chris Neal wrote:
>> Hi all.
>> I need some thoughts on sizing/tuning of the above (common) route in
>> FlumeNG to maximize throughput. Here is my setup:
>> *Source JVM (ExecSource/MemoryChannel/AvroSink):*
>> Number of ExecSources in config: 124 (yes, it's a ton. Can't do
>> anything about it :) The write rate to the source files is fairly fast and
>> ExecSource.batchSize = 1000
>> (so, when all 124 tail -F instances get 1000 events, they all dump to the
>> memory channel)
>> MemoryChannel.capacity = 1000000
>> MemoryChannel.transactionCapacity = 1000
>> (somewhat unclear on what this is. Docs say "The number of events stored
>> in the channel per transaction", but what is a "transaction" to a
>> AvroSink.batchSize = 1000
>> *Destination JVM (AvroSource/FileChannel/HDFSSink)*
>> (Cluster of two JVMs on two servers, each configured the same as per
>> -XX:MaxDirectMemorySize is not defined, so whatever the default is
>> AvroSource.threads = 64
>> FileChannel.transactionCapacity = 1000
>> FileChannel.capacity = 32000000
>> HDFSSink.batchSize = 1000
>> HDFSSink.threadPoolSize = 64
>> With this configuration, in about 5 minutes, I get the common Exception:
>> "Space for commit to queue couldn't be acquired Sinks are likely not
>> keeping up with sources, or the buffer size is too tight"
>> on the Source JVM. It is no where near the 4g max, rather only at