-Re: Writing to HDFS from multiple HDFS agents (separate machines)
Gary Malouf 2013-03-15, 02:42
Thanks for the pointer Mike. Any thoughts on how you choose how many
consumers per channel? I will eventually find the optimal number via perf
testing, but it would be good to start with a nice default.
On Thu, Mar 14, 2013 at 10:30 PM, Gary Malouf <[EMAIL PROTECTED]> wrote:
> Paul, I interpreted the host property to be for identifying the host that
> an event originates from rather than the host of the sink which writes the
> event to HDFS? Is my understanding correct?
> What happens if I am using the NettyAvroRpcClient to feed events from a
> different server round robin style to two hdfs writing agents; should I
> then NOT set the host property on client side and rely on the interceptor?
> On Thu, Mar 14, 2013 at 6:34 PM, Gary Malouf <[EMAIL PROTECTED]>wrote:
>> To be clear, I am referring to the segregating of data from different
>> flume sinks as opposed to the original source of the event. Having said
>> that, it sounds like your approach is the easiest.
>> On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf <[EMAIL PROTECTED]>wrote:
>>> Hi guys,
>>> I'm new to flume (hdfs for that metter), using the version packaged with
>>> CDH4 (1.3.0) and was wondering how others are maintaining different file
>>> names being written to per HDFS sink.
>>> My initial thought is to create a separate sub-directory in hdfs for
>>> each sink - though I feel like the better way is to somehow prefix each
>>> file with a unique sink id. Are there any patterns that others are
>>> following for this?