Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Questions about Batching in Flume


Copy link to this message
-
Questions about Batching in Flume
Hi All,

Most streaming systems have built-in support for batching since it
often offers major performance benefits in terms of throughput.

I'm a little confused about the state of batching in Flume today. It
looks like a ChannelProcessor can process a batch of events within one
transaction, but internally this just calls Channel.put() several
times.

As far as I can tell, both of the durable channels (JDBC and File)
actually flush to disk in some fashion whenever there is a doPut(). It
seems to me like it makes sense to buffer all of those puts in memory
and only flush them once per transaction. Otherwise, isn't the benefit
of batching put()'s within a transaction lost?

I think I might be missing something here, any pointers are appreciated.

- Patrick
+
Brock Noland 2012-07-11, 07:05
+
Juhani Connolly 2012-07-11, 09:12
+
Patrick Wendell 2012-07-11, 21:24