|
|
+
Denis Lowe 2013-03-01, 23:57
-
Re: process failed - java.lang.OutOfMemoryErrorBrock Noland 2013-03-02, 17:30
Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump.
-- Brock Noland Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote: > process failed - java.lang.OutOfMemoryError > > We observed the following error: > 01 Mar 2013 21:37:24,807 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:460) - process failed > java.lang.OutOfMemoryError > at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native Method) > at org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222) > at org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159) > at org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109) > at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92) > at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70) > at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216) > at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53) > at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172) > at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170) > at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143) > at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170) > at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364) > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729) > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > > Unfortunately the error does not state if it is because of lack of Heap, Perm or Direct Memory? > > Looking at the system memory we could see that we were using 3GB of 7GB (ie less than half of the physical memory was used) > > Using VisualVM profiler we could see that we had not maxed out the Heap Memory 75MB of 131MB (allocated) > PermGen was fine 16MB of 27MB (allocated) > > Buffer Usage is as follows: > Direct Memory: > < 50MB (this gets freed after each GC) > > Mapped Memory: > count 9 > 144MB (always stays constant) > > I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage NOT Mapped buffer Memory? > > The other thing we noticed was that after restart the flume process "RES" size starts at around 200MB and then over a period of a week will grow up to 3GB after which we observed the above error. > Unfortunately we cannot see where this 3GB of memory is being used when profiled with VisualVM and JConsole (max heap size is set to 256MB) - there definitely appears to be a slow memory leak? > > Flume is the only process running on this server: > 64bit Centos > java version "1.6.0_27" (64bit) > > The flume collector is configured with 8 file channels writing to S3 using the HDFS sink. (8 upstream servers a pushing events to 2 downsteam collectors) > > Each of the 8 channels/sinks is configured as follows: > ## impression source > agent.sources.impressions.type = avro > agent.sources.impressions.bind = 0.0.0.0 > agent.sources.impressions.port = 5001 > agent.sources.impressions.channels = impressions-s3-channel > ## impression channel > agent.channels.impressions-s3-channel.type = file > agent.channels.impressions-s3-channel.checkpointDir = /mnt/flume-ng/checkpoint/impressions-s3-channel > agent.channels.impressions-s3-channel.dataDirs = /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel |