-Questions about Batching in Flume
Patrick Wendell 2012-07-11, 06:12
Most streaming systems have built-in support for batching since it
often offers major performance benefits in terms of throughput.
I'm a little confused about the state of batching in Flume today. It
looks like a ChannelProcessor can process a batch of events within one
transaction, but internally this just calls Channel.put() several
As far as I can tell, both of the durable channels (JDBC and File)
actually flush to disk in some fashion whenever there is a doPut(). It
seems to me like it makes sense to buffer all of those puts in memory
and only flush them once per transaction. Otherwise, isn't the benefit
of batching put()'s within a transaction lost?
I think I might be missing something here, any pointers are appreciated.