Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> process failed - java.lang.OutOfMemoryError


Copy link to this message
-
process failed - java.lang.OutOfMemoryError
process failed - java.lang.OutOfMemoryError

We observed the following error:
01 Mar 2013 21:37:24,807 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:460)  - process failed
java.lang.OutOfMemoryError
        at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native
Method)
        at
org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
        at
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
        at
org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
        at
org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
        at
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
        at
org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
        at
org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
        at
org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
        at
org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
        at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
        at
org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
        at
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
        at
org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
        at
org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

Unfortunately the error does not state if it is because of lack of Heap,
Perm or Direct Memory?

Looking at the system memory we could see that we were using 3GB of 7GB (ie
less than half of the physical memory was used)

Using VisualVM profiler we could see that we had not maxed out the Heap
Memory 75MB of 131MB (allocated)
PermGen was fine 16MB of 27MB (allocated)

Buffer Usage is as follows:
Direct Memory:
< 50MB (this gets freed after each GC)

Mapped Memory:
count 9
144MB (always stays constant)

I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage
NOT Mapped buffer Memory?

The other thing we noticed was that after restart the flume process "RES"
size starts at around 200MB and then over a period of a week will grow up
to 3GB after which we observed the above error.
Unfortunately we cannot see where this 3GB of memory is being used when
profiled with VisualVM and JConsole (max heap size is set to 256MB) - there
definitely appears to be a slow memory leak?

Flume is the only process running on this server:
64bit Centos
java version "1.6.0_27" (64bit)

The flume collector is configured with 8 file channels writing to S3 using
the HDFS sink. (8 upstream servers a pushing events to 2 downsteam
collectors)

Each of the 8 channels/sinks is configured as follows:
## impression source
agent.sources.impressions.type = avro
agent.sources.impressions.bind = 0.0.0.0
agent.sources.impressions.port = 5001
agent.sources.impressions.channels = impressions-s3-channel
## impression  channel
agent.channels.impressions-s3-channel.type = file
agent.channels.impressions-s3-channel.checkpointDir /mnt/flume-ng/checkpoint/impressions-s3-channel
agent.channels.impressions-s3-channel.dataDirs /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel
agent.channels.impressions-s3-channel.maxFileSize = 210000000
agent.channels.impressions-s3-channel.capacity = 2000000
agent.channels.impressions-s3-channel.checkpointInterval = 300000
agent.channels.impressions-s3-channel.transactionCapacity = 10000
# impression s3 sink
agent.sinks.impressions-s3-sink.type = hdfs
agent.sinks.impressions-s3-sink.channel = impressions-s3-channel
agent.sinks.impressions-s3-sink.hdfs.path = s3n://KEY:SECRET_KEY@S3-PATH
agent.sinks.impressions-s3-sink.hdfs.filePrefix impressions-%{collector-host}
agent.sinks.impressions-s3-sink.hdfs.callTimeout = 0
agent.sinks.impressions-s3-sink.hdfs.rollInterval = 3600
agent.sinks.impressions-s3-sink.hdfs.rollSize = 450000000
agent.sinks.impressions-s3-sink.hdfs.rollCount = 0
agent.sinks.impressions-s3-sink.hdfs.codeC = gzip
agent.sinks.impressions-s3-sink.hdfs.fileType = CompressedStream
agent.sinks.impressions-s3-sink.hdfs.batchSize = 100

I am using flume-ng 1.3.1 with the following parameters:
JAVA_OPTS="-Xms64m -Xmx256m -Xss128k -XX:MaxDirectMemorySize=256m
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/mnt/logs/flume-ng/gc.log"

We have 2 collectors running and they both fail at pretty much the same
time.

So from what i can see there appears to be a slow memory leak with the HDFS
sink, but have no idea how track this down or what alternate configuration
i can use to prevent this from happening again?

Any ideas would be greatly appreciated?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB