|
|
+
larryzhang 2013-03-11, 10:49
+
Brock Noland 2013-03-11, 16:07
-
Re: JVM error while collecting from hour dividing log with flume-nglarryzhang 2013-03-13, 01:36
Great. I had updated jvm version to 1.6.0_31 yesterday, and it works
well till now. Thanks a lot. On 03/12/2013 12:07 AM, Brock Noland wrote: > You are using a known bad jvm version. I would upgrade: > http://wiki.apache.org/hadoop/HadoopJavaVersions > > > On Mon, Mar 11, 2013 at 5:49 AM, larryzhang <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > > Hi, > I want to collect and analyse user logs every 5 minutes. Now > we have origin log file which generated by nginx and divided by > hour, about 30,000,000 logs per hour. The log format is like this: > 60.222.199.118 - - [11/Mar/2013:16:00:00 +0800] "GET .... > Because I want to firstly collect the logs into file. so I > wrote a FileEventSink, just did some modification based on > org.apache.flume.sink.hdfs.BucketWriter.java and > org.apache.flume.sink.hdfs.HDFSEventSink.java. Following is my > flume config file: > =================> a1.sources = r1 > a1.channels = c1 > a1.sinks = k1 > > a1.sources.r1.type = cn.larry.flume.source.MyExecSource //I > need to fetch time and other info into headers, so I add these > logics based on ExecSource > a1.sources.r1.command = tail -n +0 -F /data2/log/log_2013031117 > <tel:2013031117>.log > a1.sources.r1.channels = c1 > a1.sources.r1.batchSize = 1 //I set this to 1 because > otherwise it will lost data at the end of the log file, I apply > this patch https://issues.apache.org/jira/browse/FLUME-1819 but it > seems no help... > > a1.channels.c1.type = memory > a1.channels.c1.capacity = 1000000 > a1.channels.c1.transactionCapacity = 10000 > > a1.sinks.k1.type = cn.larry.flume.sink.FileEventSink > a1.sinks.k1.channel = c1 > a1.sinks.k1.file.path = /opt/livedata/%Y%m%d/%H > a1.sinks.k1.file.filePrefix = log-%Y%m%d%H%M > a1.sinks.k1.file.round = true > a1.sinks.k1.file.roundValue = 5 > a1.sinks.k1.file.roundUnit = minute > a1.sinks.k1.file.rollInterval=300 > a1.sinks.k1.file.rollSize=0 > a1.sinks.k1.file.rollCount=0 > a1.sinks.k1.file.batchSize=100 > > And because I need to change the source log file name each > hour, so I wrote a script, which does 3 things: > 1. At the 1st minute per hour: > ->copy a new config file, which just change the source > log file name(a1.sources.r1.command = tail -n +0 -F > /data2/log/log_<new time>.log) > ->start new flume process which use the new conifg. (I > did this because if flume process die, it won't affect next hour) > 2. At the 30th minute per hour: > -> kill the flume process of last hour. > This project has been run more than 10 days, most of time it > works well, but sometimes flume process crashed due to JVM error, > about once every 2 days! I used jvm version 1.6.0_27-ea. Here's > the log of error which happened on 2013-03-11: > > 2013-03-11 05:45:11,302 (file-k1-roll-timer-0) [INFO - > cn.larry.flume.sink.FileBucketWriter.renameBucket(FileBucketWriter.java:408)] > Renaming /opt/livedata/20130311/05 > <tel:20130311%2F05>/log-201303110540.1362951611255.tmp to > /opt/livedata/20130311/05 > <tel:20130311%2F05>/log-201303110540.1362951611255 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00002b4247be034e, pid=1463, tid=1098979648 > # > # JRE version: 6.0_18-b07 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode > linux-amd64 ) > # Problematic frame: > # V [libjvm.so+0x2de34e] > # > # An error report file with more information is saved as: > # /opt/scripts/tvhadoop/flume/flume-1.3.0/bin/hs_err_pid1463.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > + exec /usr/local/jdk/bin/java -Xmx2048m |