Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Simple HDFS Sink file rolling question please.


Copy link to this message
-
Simple HDFS Sink file rolling question please.
Hi :)

I have an ExecSource running a tail -F on a bunch of log files that get
rotated nightly by log4J.  I want my HDFS Sink to roll them when log4J
rolls them.  I tried setting all the "roll" parameters to 0, thinking a new
file handle from the ExecSource would cause the current file in HDFS to be
closed, and a new file to be created, but I'm seeing only the new file
created, and the previous days file is still there as a .tmp file, unclosed.

I was wondering what configuration would achieve the behavior I'm after?
I was thinking a rollInterval of 24 hours, but wouldn't that cause HDFS to
roll the file at a different time than log4J rolled it?

Thanks for the time :)

Here is my HDFS Sink setup currently:

# hdfs-hadoopjt01_1-sink properties
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.type = hdfs
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.path hdfs://nameservice1/%{path}
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.filePrefix %{filename}.%Y-%m-%d_1
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollInterval = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollSize = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollCount = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.batchSize = 10000
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.threadsPoolSize = 8
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollTimerPoolSize = 5
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.codeC = GzipCodec
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.fileType = CompressedStream
+
Mike Percy 2013-03-26, 00:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB