-Re: best way to make all hdfs records in one file under a folder?
Jeff Lord 2014-01-20, 21:46
If you don't intend to roll based on # of events than you will want to set
rollCount to 0.
MyAgent.sinks.HDFS.hdfs.rollCount = 0
On Mon, Jan 20, 2014 at 12:35 PM, Jimmy <[EMAIL PROTECTED]> wrote:
> Seems like the only reason is "too many files" issue, correct?
> File Crusher executed regularly might be better option than trying to tune
> it in flume
> ---------- Forwarded message ----------
> From: Chen Wang <[EMAIL PROTECTED]>
> Date: Mon, Jan 20, 2014 at 11:21 AM
> Subject: Re: best way to make all hdfs records in one file under a folder?
> To: [EMAIL PROTECTED]
> Its by every 6 minutes(thats why i set the roll time to be 60*5=300. the
> data size is around 15M. Thus I want them all in one file.
> On Mon, Jan 20, 2014 at 10:57 AM, Christopher Shannon <
> [EMAIL PROTECTED]> wrote:
>> How is your data partitioned, by date?
>> On Monday, January 20, 2014, Chen Wang <[EMAIL PROTECTED]>
>>> I have flume setup to flow partitioned data to hdfs, each partition has
>>> its own file folder. Is there a way to specify all the data under one
>>> partition to be in one file?
>>> I am currently using
>>> MyAgent.sinks.HDFS.hdfs.batchSize = 10000
>>> MyAgent.sinks.HDFS.hdfs.rollSize = 15000000
>>> MyAgent.sinks.HDFS.hdfs.rollCount = 10000
>>> MyAgent.sinks.HDFS.hdfs.rollInterval = 360
>>> to make the file roll on 15m data or after 6 minute.
>>> Is this the best way to achieve my goal?