Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> process failed - java.lang.OutOfMemoryError


+
Denis Lowe 2013-03-01, 23:57
Copy link to this message
-
Re: process failed - java.lang.OutOfMemoryError
Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump.  

--
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote:

> process failed - java.lang.OutOfMemoryError
>
> We observed the following error:
> 01 Mar 2013 21:37:24,807 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:460)  - process failed
> java.lang.OutOfMemoryError
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native Method)
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
>         at org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
>         at org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
>         at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
>         at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
>         at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
>         at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>         at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
>
> Unfortunately the error does not state if it is because of lack of Heap, Perm or Direct Memory?
>
> Looking at the system memory we could see that we were using 3GB of 7GB (ie less than half of the physical memory was used)
>
> Using VisualVM profiler we could see that we had not maxed out the Heap Memory 75MB of 131MB (allocated)
> PermGen was fine 16MB of 27MB (allocated)
>
> Buffer Usage is as follows:
> Direct Memory:
> < 50MB (this gets freed after each GC)
>
> Mapped Memory:
> count 9
> 144MB (always stays constant)
>
> I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage NOT Mapped buffer Memory?
>
> The other thing we noticed was that after restart the flume process "RES" size starts at around 200MB and then over a period of a week will grow up to 3GB after which we observed the above error.
> Unfortunately we cannot see where this 3GB of memory is being used when profiled with VisualVM and JConsole (max heap size is set to 256MB) - there definitely appears to be a slow memory leak?
>
> Flume is the only process running on this server:
> 64bit Centos
> java version "1.6.0_27" (64bit)
>
> The flume collector is configured with 8 file channels writing to S3 using the HDFS sink. (8 upstream servers a pushing events to 2 downsteam collectors)
>
> Each of the 8 channels/sinks is configured as follows:
> ## impression source
> agent.sources.impressions.type = avro
> agent.sources.impressions.bind = 0.0.0.0
> agent.sources.impressions.port = 5001
> agent.sources.impressions.channels = impressions-s3-channel
> ## impression  channel
> agent.channels.impressions-s3-channel.type = file
> agent.channels.impressions-s3-channel.checkpointDir = /mnt/flume-ng/checkpoint/impressions-s3-channel
> agent.channels.impressions-s3-channel.dataDirs = /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel