Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Questions about Batching in Flume


Copy link to this message
-
Questions about Batching in Flume
Hi All,

Most streaming systems have built-in support for batching since it
often offers major performance benefits in terms of throughput.

I'm a little confused about the state of batching in Flume today. It
looks like a ChannelProcessor can process a batch of events within one
transaction, but internally this just calls Channel.put() several
times.

As far as I can tell, both of the durable channels (JDBC and File)
actually flush to disk in some fashion whenever there is a doPut(). It
seems to me like it makes sense to buffer all of those puts in memory
and only flush them once per transaction. Otherwise, isn't the benefit
of batching put()'s within a transaction lost?

I think I might be missing something here, any pointers are appreciated.

- Patrick
+
Brock Noland 2012-07-11, 07:05
+
Juhani Connolly 2012-07-11, 09:12
+
Patrick Wendell 2012-07-11, 21:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB