Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Writing to HDFS from multiple HDFS agents (separate machines)

Copy link to this message
RE: Writing to HDFS from multiple HDFS agents (separate machines)
Paul Chavez 2013-03-14, 22:31
You can use a Host Interceptor on the agents running an HDFS sink, and then use %{host} in the .hdfs.filePrefix property. This isn't really documented but it works, docs only mention using those tokens in the path property but they seem to be ok for the filePrefix.

Here's some excerpts of a test config I have that does just that:

#define the interceptor on the source
staging2.sources.httpSource_stg.interceptors = iHost
staging2.sources.httpSource_stg.interceptors.iHost.type = host
staging2.sources.httpSource_stg.interceptors.iHost.useIP = false

#use the header the interceptor added in the filePrefix
staging2.sinks.hdfs_FilterLogst.type = hdfs
staging2.sinks.hdfs_FilterLogs.channel = mc_FilterLogs
staging2.sinks.hdfs_FilterLogs.hdfs.path = /flume_stg/FilterLogsJSON/%Y%m%d
staging2.sinks.hdfs_FilterLogs.hdfs.filePrefix = %{host}

Hope that helps,
Paul Chavez

From: Gary Malouf [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 14, 2013 2:55 PM
To: user
Subject: Writing to HDFS from multiple HDFS agents (separate machines)

Hi guys,

I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and was wondering how others are maintaining different file names being written to per HDFS sink.

My initial thought is to create a separate sub-directory in hdfs for each sink - though I feel like the better way is to somehow prefix each file with a unique sink id.  Are there any patterns that others are following for this?