Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> hdfs.idleTimeout ,what's it used for ?


+
Bhaskar V. Karambelkar 2013-01-17, 20:07
+
Connor Woodson 2013-01-17, 20:29
Copy link to this message
-
Re: hdfs.idleTimeout ,what's it used for ?
Ah I see. Again something useful to have in the flume user guide.

On Thu, Jan 17, 2013 at 3:29 PM, Connor Woodson <[EMAIL PROTECTED]> wrote:
> the rollInterval will still cause the last 01-17 file to be closed
> eventually. The way the HDFS sink works with the different files is each
> unique path is specified by a different BucketWriter object. The sink can
> hold as many objects as specified by hdfs.maxOpenWorkers (default: 5000),
> and bucketwriters are only removed when you create the 5001th writer (5001th
> unique path). However, generally once a writer is closed it is never used
> again (all of your 1-17 writers will never be used again). To avoid keeping
> them in the sink's internal list of writers, the idleTimeout is a specified
> number of seconds in which no data is received by the BucketWriter. After
> this time, the writer will try to close itself and will then tell the sink
> to remove it, thus freeing up everything used by the bucketwriter.
>
> So the idleTimeout is just a setting to help limit memory usage by the hdfs
> sink. The ideal time for it is longer than the maximum time between events
> (capped at the rollInterval) - if you know you'll receive a constant stream
> of events you might just set it to a minute or something. Or if you are fine
> with having multiple files open per hour, you can set it to a lower number;
> maybe just over the average time between events. For me in just testing, I
> set it >= rollInterval for the cases when no events are received in a given
> hour (I'd rather keep the object alive for an extra hour than create files
> every 30 minutes or something).
>
> Hope that was helpful,
>
> - Connor
>
>
> On Thu, Jan 17, 2013 at 12:07 PM, Bhaskar V. Karambelkar
> <[EMAIL PROTECTED]> wrote:
>>
>> Say If I have
>>
>> a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/
>>
>> hdfs.rollInterval=60
>>
>> Now, if there is a file
>> /flume/events/2013-01-17/flume_XXXXXXXXX.tmp
>> This file is not ready to be rolled over yet, i.e. 60 seconds are not
>> up and now it's past 12 midnight, i.e. new day
>> And events start to be written to
>> /flume/events/2013-01-18/flume_XXXXXXXX.tmp
>>
>> will the file 2013-01-17 never be rolled over, unless I have something
>> like hdfs.idleTimeout=60  ?
>> If so how do flume sinks keep track of files they need to rollover
>> after idealTimeout ?
>>
>> In short what's the exact use of idealTimeout parameter ?
>
>
+
Juhani Connolly 2013-01-18, 02:08
+
Mohit Anchlia 2013-01-18, 02:17
+
Connor Woodson 2013-01-18, 02:19
+
Connor Woodson 2013-01-18, 02:20
+
Connor Woodson 2013-01-18, 02:23
+
Juhani Connolly 2013-01-18, 02:46
+
Connor Woodson 2013-01-18, 03:24
+
Juhani Connolly 2013-01-18, 03:39
+
Connor Woodson 2013-01-18, 04:18
+
Mohit Anchlia 2013-01-18, 05:12
+
Juhani Connolly 2013-01-18, 06:37
+
Juhani Connolly 2013-01-18, 02:39
+
Connor Woodson 2013-01-18, 02:42