|
|
+
Patrick Wendell 2012-07-11, 06:12
+
Brock Noland 2012-07-11, 07:05
+
Juhani Connolly 2012-07-11, 09:12
-
Re: Questions about Batching in FlumePatrick Wendell 2012-07-11, 21:24
Hey Folks,
So the hole in my thinking was, as brock pointed out, that the FileChannel doesn't actually sync() until a commit. I misread the code while looking at it quickly. So it does allow batching within a transaction as desired. The JDBC channel however looks like it persists the events on every put() rather than on transaction boundaries: @Override public void put(Event event) throws ChannelException { getProvider().persistEvent(getName(), event); } Am I wrong on this one as well? - Patrick On Wed, Jul 11, 2012 at 2:12 AM, Juhani Connolly <[EMAIL PROTECTED]> wrote: > I think some of my earlier speculation may have lead to this > misunderstanding? I can confirm after changing the exec source that the > puts/takes themselves are not generating the bottleneck, and that > performance is fine so long as the number of transactions is not too > large(as each transaction commit will cause an fsync). > > An option for the channel to store x events on the heap before flushing > could be interesting, though it would void any guarrantee deliveries made. I > do not think this is necessarily a bad thing so long as it is documented(and > people who want everything committed can request flushing the buffer every > commit). > > > On 07/11/2012 04:05 PM, Brock Noland wrote: >> >> What leads you to that conclusion about FC? (I am curious in case there is >> something I am unaware of.) This is where a Put ends up being written and >> there is no flush until a commit. >> >> >> https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java#L165 >> >> Brock >> >> On Wed, Jul 11, 2012 at 7:12 AM, Patrick Wendell <[EMAIL PROTECTED]> >> wrote: >> >>> Hi All, >>> >>> Most streaming systems have built-in support for batching since it >>> often offers major performance benefits in terms of throughput. >>> >>> I'm a little confused about the state of batching in Flume today. It >>> looks like a ChannelProcessor can process a batch of events within one >>> transaction, but internally this just calls Channel.put() several >>> times. >>> >>> As far as I can tell, both of the durable channels (JDBC and File) >>> actually flush to disk in some fashion whenever there is a doPut(). It >>> seems to me like it makes sense to buffer all of those puts in memory >>> and only flush them once per transaction. Otherwise, isn't the benefit >>> of batching put()'s within a transaction lost? >>> >>> I think I might be missing something here, any pointers are appreciated. >>> >>> - Patrick >>> >> >> > > |