After I changed my config to this it worked. It looks like flume creates
new file for any of the conditions that matches first. Since there is a
default of 10 for rollCount it was creating a new document. But I think it
causes lot of problem because I need to now keep track and estimate all
these variables. I think it should just do based on what's specified in the
config, so if I only specify rollSize then it souldn't consider any other
options for it's logic to create a new file.

foo.sinks.hdfsSink.type = hdfs
foo.sinks.hdfsSink.hdfs.path = hdfs://dsdb1:54310/flume/%{host}
foo.sinks.hdfsSink.hdfs.filePrefix = web
foo.sinks.hdfsSink.hdfs.rollInterval  = 600
foo.sinks.hdfsSink.hdfs.rollCount  = 200000000
foo.sinks.hdfsSink.hdfs.rollSize  = 5000000000
foo.sinks.hdfsSink.hdfs.fileType  = SequenceFile

On Fri, Jun 15, 2012 at 5:38 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB