Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS sink Bucketwriter working

Copy link to this message
HDFS sink Bucketwriter working

I had few doubts about HDFS sink Bucketwriter :

-- How does HDFS's bucketwriter works? What criteria does it use to create
another bucket?

-- Creation of a file in HDFS is function of how many parameters ? Initially
I thought it is function of only rolling parameter(interval/size). But
it is also function 'batchsize' and 'txnEventMax'.

-- If my requirement is that; If I get data from 10 Avro sinks to  a
single avro source and
I want to dump it to HDFS with fixed size (say 64 MB) file. What should
I do?
Presently If I set it 64 MB rolling size; Bucketwriter creates many
files ( I suspect it
is = trxEventMax) and after a while it throws exceptions like 'too many
open files'. (I have limit of
75000 open file descriptors).

Information about above things will be of great help to tune flume
properly for the requirements.