Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Writing to HDFS from multiple HDFS agents (separate machines)

Gary Malouf 2013-03-14, 21:54
Mohammad Tariq 2013-03-14, 22:00
Seshu V 2013-03-15, 21:20
Copy link to this message
RE: Writing to HDFS from multiple HDFS agents (separate machines)
You can use a Host Interceptor on the agents running an HDFS sink, and then use %{host} in the .hdfs.filePrefix property. This isn't really documented but it works, docs only mention using those tokens in the path property but they seem to be ok for the filePrefix.

Here's some excerpts of a test config I have that does just that:

#define the interceptor on the source
staging2.sources.httpSource_stg.interceptors = iHost
staging2.sources.httpSource_stg.interceptors.iHost.type = host
staging2.sources.httpSource_stg.interceptors.iHost.useIP = false

#use the header the interceptor added in the filePrefix
staging2.sinks.hdfs_FilterLogst.type = hdfs
staging2.sinks.hdfs_FilterLogs.channel = mc_FilterLogs
staging2.sinks.hdfs_FilterLogs.hdfs.path = /flume_stg/FilterLogsJSON/%Y%m%d
staging2.sinks.hdfs_FilterLogs.hdfs.filePrefix = %{host}

Hope that helps,
Paul Chavez

From: Gary Malouf [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 14, 2013 2:55 PM
To: user
Subject: Writing to HDFS from multiple HDFS agents (separate machines)

Hi guys,

I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and was wondering how others are maintaining different file names being written to per HDFS sink.

My initial thought is to create a separate sub-directory in hdfs for each sink - though I feel like the better way is to somehow prefix each file with a unique sink id.  Are there any patterns that others are following for this?

Gary Malouf 2013-03-14, 22:34
Mike Percy 2013-03-15, 01:46
Gary Malouf 2013-03-15, 02:30
Gary Malouf 2013-03-15, 02:42
Mike Percy 2013-03-15, 20:43
Paul Chavez 2013-03-15, 03:30