Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Loading weblogs/applogs in CDH4.2 by using Flume-NG 1.3


Copy link to this message
-
Loading weblogs/applogs in CDH4.2 by using Flume-NG 1.3
Hello All, I'm running into a issue when trying to load app servers request
logs in hadoop.

I've a flume agent running with following config. I get the consolidated
file in one directory but it get rotated i.e. one file everyday. My
following config is not working because it's hard coded file name in it.

agent.sources = apache
agent.sources.apache.type = exec
agent.sources.apache.command = cat /archive/request.log.2013_06_07

Q: how could I use so that it could get the rotated file? Currently for
loading the next file I've to kill the agent and collector both, change the
config file with hard coded file name. Start collectory and agent both to
load the file.

Q: Collector is loading the file into hdfs as .tmp extention. Untill I kill
the collector it dont' rotate the file to normal name. Following is my
config:

## Write to HDFS
# http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path /user/flume/events/%{log_type}/%{host}/%y-%m-%d
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text

collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 0
collector.sinks.HadoopOut.hdfs.rollInterval = 0

If I play with the last above three parms then it create lot of small files
and it become a challange to use them in Hive to see data.

I wanted one file for per day request logs.

I really appreciate your assistance and time.

Thanks,

--
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**