|
|
-
Flume hdfs sink rollover
Mohit Anchlia 2012-08-25, 00:43
I have rollover defined either to roll every 5G or 1+ hr but doesn't seem to be working. Could you please suggest if I got the conf incorrectly configured?
foo.sinks.hdfsSink.hdfs.filePrefix = web foo.sinks.hdfsSink.hdfs.rollInterval = 4000 foo.sinks.hdfsSink.hdfs.rollCount = 0 foo.sinks.hdfsSink.hdfs.rollSize = 5000000000 foo.sinks.hdfsSink.hdfs.fileType = SequenceFile foo.sinks.hdfsSink.hdfs.codeC = snappy
drwxr-xr-x - root root 5 2012-08-24 14:58 /flume_vol/flume/2012/08/24/13/dslg1 -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy drwxr-xr-x - root root 5 2012-08-24 15:04 /flume_vol/flume/2012/08/24/13/dslg2 -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy
-
Re: Flume hdfs sink rollover
Denny Ye 2012-08-26, 13:47
hi Mohit, Why you confirm it doesn't work at time? I think it reaches to size limitation of your setting 'hdfs.rollSize'. Each snappy file almost 5 hundreds megabytes every 6 or 7 minutes. It fits the compression radio of snappy format I rearraged your file order. It's well from my point.
-rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp
-rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy
-Regards Denny Ye
2012/8/25 Mohit Anchlia <[EMAIL PROTECTED]>
> I have rollover defined either to roll every 5G or 1+ hr but doesn't seem > to be working. Could you please suggest if I got the conf incorrectly > configured? > > foo.sinks.hdfsSink.hdfs.filePrefix = web > foo.sinks.hdfsSink.hdfs.rollInterval = 4000 > foo.sinks.hdfsSink.hdfs.rollCount = 0 > foo.sinks.hdfsSink.hdfs.rollSize = 5000000000 > foo.sinks.hdfsSink.hdfs.fileType = SequenceFile > foo.sinks.hdfsSink.hdfs.codeC = snappy > > > > drwxr-xr-x - root root 5 2012-08-24 14:58 > /flume_vol/flume/2012/08/24/13/dslg1 > -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp > -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy > -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy > -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy > -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy > drwxr-xr-x - root root 5 2012-08-24 15:04 > /flume_vol/flume/2012/08/24/13/dslg2 > -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy > -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy > -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy > -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy > -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy >
-
Re: Flume hdfs sink rollover
Mohit Anchlia 2012-08-26, 15:04
On Sun, Aug 26, 2012 at 6:47 AM, Denny Ye <[EMAIL PROTECTED]> wrote:
> hi Mohit, > Why you confirm it doesn't work at time? I think it reaches to size > limitation of your setting 'hdfs.rollSize'. Each snappy file almost 5 > hundreds megabytes every 6 or 7 minutes. It fits the compression radio of > snappy format > I rearraged your file order. It's well from my point. > > Size I gave is 5G in my conf but it rolls over at 400M. Does it mean that flume uses uncompressed size to determine when to rollover? Is it all calcluated in memory as it writes to the sink and before it compresses?
> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy > -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy > -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy > -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy > -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 > /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp > > -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy > -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy > -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy > -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy > -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 > /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy > > -Regards > Denny Ye > > 2012/8/25 Mohit Anchlia <[EMAIL PROTECTED]> > >> I have rollover defined either to roll every 5G or 1+ hr but doesn't seem >> to be working. Could you please suggest if I got the conf incorrectly >> configured? >> >> foo.sinks.hdfsSink.hdfs.filePrefix = web >> foo.sinks.hdfsSink.hdfs.rollInterval = 4000 >> foo.sinks.hdfsSink.hdfs.rollCount = 0 >> foo.sinks.hdfsSink.hdfs.rollSize = 5000000000 >> foo.sinks.hdfsSink.hdfs.fileType = SequenceFile >> foo.sinks.hdfsSink.hdfs.codeC = snappy >> >> >> >> drwxr-xr-x - root root 5 2012-08-24 14:58 >> /flume_vol/flume/2012/08/24/13/dslg1 >> -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp >> -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy >> -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy >> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy >> -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy >> drwxr-xr-x - root root 5 2012-08-24 15:04 >> /flume_vol/flume/2012/08/24/13/dslg2 >> -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy >> -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy >> -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy >> -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy >> -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy >> > >
-
Re: Flume hdfs sink rollover
Denny Ye 2012-08-27, 01:29
Yes, you are right. Flume uses uncompressed size to judge the case of rolling. The appropriate place to calculate size is in-memory. Normally, compression ratio of snappy might be 5x-10x, more better if there have too many duplicated data. Thus, it fits your setting, do you agree?
-Regards Denny Ye
2012/8/26 Mohit Anchlia <[EMAIL PROTECTED]>
> > > On Sun, Aug 26, 2012 at 6:47 AM, Denny Ye <[EMAIL PROTECTED]> wrote: > >> hi Mohit, >> Why you confirm it doesn't work at time? I think it reaches to size >> limitation of your setting 'hdfs.rollSize'. Each snappy file almost 5 >> hundreds megabytes every 6 or 7 minutes. It fits the compression radio of >> snappy format >> I rearraged your file order. It's well from my point. >> >> > Size I gave is 5G in my conf but it rolls over at 400M. Does it mean that > flume uses uncompressed size to determine when to rollover? Is it all > calcluated in memory as it writes to the sink and before it compresses? > > > >> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy >> -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy >> -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy >> -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy >> -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp >> >> -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy >> -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy >> -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy >> -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy >> -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy >> >> -Regards >> Denny Ye >> >> 2012/8/25 Mohit Anchlia <[EMAIL PROTECTED]> >> >>> I have rollover defined either to roll every 5G or 1+ hr but doesn't >>> seem to be working. Could you please suggest if I got the conf incorrectly >>> configured? >>> >>> foo.sinks.hdfsSink.hdfs.filePrefix = web >>> foo.sinks.hdfsSink.hdfs.rollInterval = 4000 >>> foo.sinks.hdfsSink.hdfs.rollCount = 0 >>> foo.sinks.hdfsSink.hdfs.rollSize = 5000000000 >>> foo.sinks.hdfsSink.hdfs.fileType = SequenceFile >>> foo.sinks.hdfsSink.hdfs.codeC = snappy >>> >>> >>> >>> drwxr-xr-x - root root 5 2012-08-24 14:58 >>> /flume_vol/flume/2012/08/24/13/dslg1 >>> -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp >>> -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy >>> -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy >>> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy >>> -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy >>> drwxr-xr-x - root root 5 2012-08-24 15:04 >>> /flume_vol/flume/2012/08/24/13/dslg2 >>> -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy >>> -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy >>> -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy >>> -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext