-Re: HDFSEventSink Memory Leak Workarounds
Tim Driscoll 2013-05-22, 16:20
Sounds like the expected behavior to me based on the message, though it's a
little confusing because it's caught in an IOException.
Somewhat related, we had our idleTimeout probably set too low, so the files
would close pretty often. This was causing a memory leak for us, from what
I can tell this is due to FLUME-1864. So I think it may be a good idea to
bump up the idleTimeout if you're constantly closing idle files. I could
be wrong though, I would defer to the developers. :)
On Wed, May 22, 2013 at 8:58 AM, Paul Chavez <
[EMAIL PROTECTED]> wrote:
> This thread reminded me to check my configs since I use a low idleTimeout
> and bucket events by hour. Turned out I still had the default rollInterval
> set so I disabled that and updated my configs.
> Now I see a log of exceptions logged as warnings in the log immediately
> following an idleTimeout:
> 8:55:40.663 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> Closing idle bucketWriter
> 8:55:40.675 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886.tmp to
> 8:55:40.677 AM WARN org.apache.flume.sink.hdfs.HDFSEventSink
> HDFS IO error
> java.io.IOException: This bucket writer was closed due to idling and this
> handle is thus no longer valid
> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:391)
> at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.lang.Thread.run(Thread.java:662)
> Given these are logged WARN I have been assuming they are benign errors.
> Is that assumption correct?
> Paul Chavez
> *From:* Connor Woodson [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, May 21, 2013 2:13 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: HDFSEventSink Memory Leak Workarounds
> The other property you will want to look at is maxOpenFiles, which is
> the number of file/paths held in memory at one time.
> If you search for the email thread with subject "hdfs.idleTimeout ,what's
> it used for ?" from back in January you will find a discussion along these
> lines. As a quick summary, if rollInterval is not set to 0, you should
> avoid using idleTimeout and should set maxOpenFiles to a reasonable number
> (the default is 500 which is too large; I think that default is changed for
> - Connor
> On Tue, May 21, 2013 at 9:59 AM, Tim Driscoll <[EMAIL PROTECTED]>wrote:
>> We have a Flume Agent (version 1.3.1) set up using the HDFSEventSink. We
>> were noticing that we were running out of memory after a few days of
>> running, and believe we had pinpointed it to an issue with using the
>> hdfs.idleTimeout setting. I believe this is fixed in 1.4 per FLUME-1864.
>> Our planned workaround was to just remove the idleTimeout setting, which
>> worked, but brought up another issue. Since we are partitioning our data
>> by timestamp, at midnight, we rolled over to a new bucket/partition, opened
>> new bucket writers, and left the current bucket writers open. Ideally the
>> idleTimeout would clean this up. So instead of a slow steady leak, we're
>> encountering a 100MB leak every day.
>> Short of upgrading Flume, does anyone know of a configuration workaround
>> for this? Currently we just bumped up the heap memory and I'm having to
>> restart our agents every few days, which obviously isn't ideal.
>> Is anyone else seeing issues like this? Or how do others use the HDFS