Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS sink Bucketwriter working

Copy link to this message
Re: HDFS sink Bucketwriter working
Refer to the user guide here:

Note the defaults for rollInterval, rollSize, and rollCount. If you want to
use rollSize only, then you should set the others to 0.

Also worth mentioning setting batchSize to something larger if you want to
maximize your performance. I often go with 1000, depending on the
application you may want to go lower or higher.

On Wed, Sep 26, 2012 at 8:23 PM, Jagadish Bihani <

> Hi
> I had few doubts about HDFS sink Bucketwriter :
> -- How does HDFS's bucketwriter works? What criteria does it use to create
> another bucket?
> -- Creation of a file in HDFS is function of how many parameters ?
> Initially
> I thought it is function of only rolling parameter(interval/size). But
> apparently
> it is also function 'batchsize' and 'txnEventMax'.
> -- If my requirement is that; If I get data from 10 Avro sinks to  a
> single avro source and
> I want to dump it to HDFS with fixed size (say 64 MB) file. What should I
> do?
> Presently If I set it 64 MB rolling size; Bucketwriter creates many files
> ( I suspect it
> is = trxEventMax) and after a while it throws exceptions like 'too many
> open files'. (I have limit of
> 75000 open file descriptors).
> Information about above things will be of great help to tune flume
> properly for the requirements.
> Reagards,
> Jagadish