Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> best way to make all hdfs records in one file under a folder?


Copy link to this message
-
Re: best way to make all hdfs records in one file under a folder?
Chris,
Its by every 6 minutes(thats why i set the roll time to be 60*5=300. the
data size is around 15M. Thus I want them all in one file.
Chen
On Mon, Jan 20, 2014 at 10:57 AM, Christopher Shannon <[EMAIL PROTECTED]
> wrote:

> How is your data partitioned, by date?
>
>
> On Monday, January 20, 2014, Chen Wang <[EMAIL PROTECTED]> wrote:
>
>> Guys,
>> I have flume setup to flow partitioned data to hdfs, each partition has
>> its own file folder. Is there a way to specify all the data under one
>> partition to be in one file?
>> I am currently using
>> MyAgent.sinks.HDFS.hdfs.batchSize = 10000
>> MyAgent.sinks.HDFS.hdfs.rollSize = 15000000
>> MyAgent.sinks.HDFS.hdfs.rollCount = 10000
>> MyAgent.sinks.HDFS.hdfs.rollInterval = 360
>>
>> to make the file roll on 15m data or after 6 minute.
>>
>> Is this the best way to achieve my goal?
>> Thanks,
>> Chen
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB