Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - best way to make all hdfs records in one file under a folder?


Copy link to this message
-
Re: best way to make all hdfs records in one file under a folder?
Chen Wang 2014-01-20, 19:21
Chris,
Its by every 6 minutes(thats why i set the roll time to be 60*5=300. the
data size is around 15M. Thus I want them all in one file.
Chen
On Mon, Jan 20, 2014 at 10:57 AM, Christopher Shannon <[EMAIL PROTECTED]
> wrote:

> How is your data partitioned, by date?
>
>
> On Monday, January 20, 2014, Chen Wang <[EMAIL PROTECTED]> wrote:
>
>> Guys,
>> I have flume setup to flow partitioned data to hdfs, each partition has
>> its own file folder. Is there a way to specify all the data under one
>> partition to be in one file?
>> I am currently using
>> MyAgent.sinks.HDFS.hdfs.batchSize = 10000
>> MyAgent.sinks.HDFS.hdfs.rollSize = 15000000
>> MyAgent.sinks.HDFS.hdfs.rollCount = 10000
>> MyAgent.sinks.HDFS.hdfs.rollInterval = 360
>>
>> to make the file roll on 15m data or after 6 minute.
>>
>> Is this the best way to achieve my goal?
>> Thanks,
>> Chen
>>
>>