Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Questions about Batching in Flume


Copy link to this message
-
Re: Questions about Batching in Flume
Juhani Connolly 2012-07-11, 09:12
I think some of my earlier speculation may have lead to this
misunderstanding? I can confirm after changing the exec source that the
puts/takes themselves are not generating the bottleneck, and that
performance is fine so long as the number of transactions is not too
large(as each transaction commit will cause an fsync).

An option for the channel to store x events on the heap before flushing
could be interesting, though it would void any guarrantee deliveries
made. I do not think this is necessarily a bad thing so long as it is
documented(and people who want everything committed can request flushing
the buffer every commit).

On 07/11/2012 04:05 PM, Brock Noland wrote:
> What leads you to that conclusion about FC? (I am curious in case there is
> something I am unaware of.) This is where a Put ends up being written and
> there is no flush until a commit.
>
> https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java#L165
>
> Brock
>
> On Wed, Jul 11, 2012 at 7:12 AM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
>
>> Hi All,
>>
>> Most streaming systems have built-in support for batching since it
>> often offers major performance benefits in terms of throughput.
>>
>> I'm a little confused about the state of batching in Flume today. It
>> looks like a ChannelProcessor can process a batch of events within one
>> transaction, but internally this just calls Channel.put() several
>> times.
>>
>> As far as I can tell, both of the durable channels (JDBC and File)
>> actually flush to disk in some fashion whenever there is a doPut(). It
>> seems to me like it makes sense to buffer all of those puts in memory
>> and only flush them once per transaction. Otherwise, isn't the benefit
>> of batching put()'s within a transaction lost?
>>
>> I think I might be missing something here, any pointers are appreciated.
>>
>> - Patrick
>>
>
>