Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume hdfs sink rollover


Copy link to this message
-
Re: Flume hdfs sink rollover
Denny Ye 2012-08-27, 01:29
Yes, you are right. Flume uses uncompressed size to judge the case of
rolling. The appropriate place to calculate size is in-memory. Normally,
compression ratio of snappy might be 5x-10x, more better if there have too
many duplicated data. Thus, it fits your setting, do you agree?

-Regards
Denny Ye

2012/8/26 Mohit Anchlia <[EMAIL PROTECTED]>

>
>
> On Sun, Aug 26, 2012 at 6:47 AM, Denny Ye <[EMAIL PROTECTED]> wrote:
>
>> hi Mohit,
>>      Why you confirm it doesn't work at time?  I think it reaches to size
>> limitation of your setting 'hdfs.rollSize'. Each snappy file almost 5
>> hundreds megabytes every 6 or 7 minutes. It fits the compression radio of
>> snappy format
>>      I rearraged your file order. It's well from my point.
>>
>>
> Size I gave is 5G in my conf but it rolls over at 400M. Does it mean that
> flume uses uncompressed size to determine when to rollover? Is it all
> calcluated in memory as it writes to the sink and before it compresses?
>
>
>
>>  -rwxr-xr-x   3 root root  170657363 2012-08-24 14:58
>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy
>> -rwxr-xr-x   3 root root  407700267 2012-08-24 13:57
>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy
>> -rwxr-xr-x   3 root root  407678663 2012-08-24 13:50
>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy
>>  -rwxr-xr-x   3 root root  407742601 2012-08-24 13:44
>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy
>> -rwxr-xr-x   3 root root   28118740 2012-08-24 13:35
>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp
>>
>> -rwxr-xr-x   3 root root  159909773 2012-08-24 15:04
>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy
>> -rwxr-xr-x   3 root root  407739053 2012-08-24 13:57
>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy
>>  -rwxr-xr-x   3 root root  407786389 2012-08-24 13:50
>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy
>> -rwxr-xr-x   3 root root  407757832 2012-08-24 13:44
>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy
>> -rwxr-xr-x   3 root root   51085873 2012-08-24 13:36
>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy
>>
>> -Regards
>> Denny Ye
>>
>> 2012/8/25 Mohit Anchlia <[EMAIL PROTECTED]>
>>
>>> I have rollover defined either to roll every 5G or 1+ hr but doesn't
>>> seem to  be working. Could you please suggest if I got the conf incorrectly
>>> configured?
>>>
>>> foo.sinks.hdfsSink.hdfs.filePrefix = web
>>> foo.sinks.hdfsSink.hdfs.rollInterval  = 4000
>>> foo.sinks.hdfsSink.hdfs.rollCount  = 0
>>> foo.sinks.hdfsSink.hdfs.rollSize  = 5000000000
>>> foo.sinks.hdfsSink.hdfs.fileType  = SequenceFile
>>> foo.sinks.hdfsSink.hdfs.codeC  = snappy
>>>
>>>
>>>
>>> drwxr-xr-x   - root root          5 2012-08-24 14:58
>>> /flume_vol/flume/2012/08/24/13/dslg1
>>> -rwxr-xr-x   3 root root   28118740 2012-08-24 13:35
>>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp
>>> -rwxr-xr-x   3 root root  407700267 2012-08-24 13:57
>>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy
>>> -rwxr-xr-x   3 root root  407742601 2012-08-24 13:44
>>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy
>>> -rwxr-xr-x   3 root root  170657363 2012-08-24 14:58
>>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy
>>> -rwxr-xr-x   3 root root  407678663 2012-08-24 13:50
>>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy
>>> drwxr-xr-x   - root root          5 2012-08-24 15:04
>>> /flume_vol/flume/2012/08/24/13/dslg2
>>> -rwxr-xr-x   3 root root  407786389 2012-08-24 13:50
>>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy
>>> -rwxr-xr-x   3 root root  407757832 2012-08-24 13:44
>>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy
>>> -rwxr-xr-x   3 root root  159909773 2012-08-24 15:04
>>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy
>>> -rwxr-xr-x   3 root root   51085873 2012-08-24 13:36
>