Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - HDFSEventSink Memory Leak Workarounds


Copy link to this message
-
Re: HDFSEventSink Memory Leak Workarounds
Tim Driscoll 2013-05-22, 16:20
Sounds like the expected behavior to me based on the message, though it's a
little confusing because it's caught in an IOException.

Somewhat related, we had our idleTimeout probably set too low, so the files
would close pretty often.  This was causing a memory leak for us, from what
I can tell this is due to FLUME-1864.  So I think it may be a good idea to
bump up the idleTimeout if you're constantly closing idle files.  I could
be wrong though, I would defer to the developers. :)
On Wed, May 22, 2013 at 8:58 AM, Paul Chavez <
[EMAIL PROTECTED]> wrote:

> **
> This thread reminded me to check my configs since I use a low idleTimeout
> and bucket events by hour. Turned out I still had the default rollInterval
> set so I disabled that and updated my configs.
>
> Now I see a log of exceptions logged as warnings in the log immediately
> following an idleTimeout:
>
> 8:55:40.663 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> Closing idle bucketWriter
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886.tmp
> 8:55:40.675 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> Renaming
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886.tmp to
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886
> 8:55:40.677 AM WARN org.apache.flume.sink.hdfs.HDFSEventSink
> HDFS IO error
> java.io.IOException: This bucket writer was closed due to idling and this
> handle is thus no longer valid
>  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:391)
>  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>  at java.lang.Thread.run(Thread.java:662)
>
> Given these are logged WARN I have been assuming they are benign errors.
> Is that assumption correct?
>
> thanks,
> Paul Chavez
>
>  ------------------------------
> *From:* Connor Woodson [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, May 21, 2013 2:13 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: HDFSEventSink Memory Leak Workarounds
>
>  The other property you will want to look at is maxOpenFiles, which is
> the number of file/paths held in memory at one time.
>
> If you search for the email thread with subject "hdfs.idleTimeout ,what's
> it used for ?" from back in January you will find a discussion along these
> lines. As a quick summary, if rollInterval is not set to 0, you should
> avoid using idleTimeout and should set maxOpenFiles to a reasonable number
> (the default is 500 which is too large; I think that default is changed for
> 1.4).
>
> - Connor
>
>
> On Tue, May 21, 2013 at 9:59 AM, Tim Driscoll <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> We have a Flume Agent (version 1.3.1) set up using the HDFSEventSink.  We
>> were noticing that we were running out of memory after a few days of
>> running, and believe we had pinpointed it to an issue with using the
>> hdfs.idleTimeout setting.  I believe this is fixed in 1.4 per FLUME-1864.
>>
>> Our planned workaround was to just remove the idleTimeout setting, which
>> worked, but brought up another issue.  Since we are partitioning our data
>> by timestamp, at midnight, we rolled over to a new bucket/partition, opened
>> new bucket writers, and left the current bucket writers open.  Ideally the
>> idleTimeout would clean this up.  So instead of a slow steady leak, we're
>> encountering a 100MB leak every day.
>>
>> Short of upgrading Flume, does anyone know of a configuration workaround
>> for this?  Currently we just bumped up the heap memory and I'm having to
>> restart our agents every few days, which obviously isn't ideal.
>>
>> Is anyone else seeing issues like this?  Or how do others use the HDFS