Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Writing to HDFS from multiple HDFS agents (separate machines)

Gary Malouf 2013-03-14, 21:54
Mohammad Tariq 2013-03-14, 22:00
Seshu V 2013-03-15, 21:20
Paul Chavez 2013-03-14, 22:31
Gary Malouf 2013-03-14, 22:34
Mike Percy 2013-03-15, 01:46
Gary Malouf 2013-03-15, 02:30
Gary Malouf 2013-03-15, 02:42
Mike Percy 2013-03-15, 20:43
Copy link to this message
Re: Writing to HDFS from multiple HDFS agents (separate machines)
It just depends on what you want to do with the header. In the case I presented the header is set by the agent running the HDFS sink, which seemed to align with your use case. If you need to know the originating host, just have the interceptor or originating host set a different header, the %{} notation allows you to specify an arbitrary header to swap in for the token, as long as it exists, of course.

On Mar 14, 2013, at 7:31 PM, "Gary Malouf" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Paul, I interpreted the host property to be for identifying the host that an event originates from rather than the host of the sink which writes the event to HDFS?  Is my understanding correct?
What happens if I am using the NettyAvroRpcClient to feed events from a different server round robin style to two hdfs writing agents; should I then NOT set the host property on client side and rely on the interceptor?
On Thu, Mar 14, 2013 at 6:34 PM, Gary Malouf <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
To be clear, I am referring to the segregating of data from different flume sinks as opposed to the original source of the event.  Having said that, it sounds like your approach is the easiest.

On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi guys,

I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and was wondering how others are maintaining different file names being written to per HDFS sink.

My initial thought is to create a separate sub-directory in hdfs for each sink - though I feel like the better way is to somehow prefix each file with a unique sink id.  Are there any patterns that others are following for this?