Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Writing to HDFS from multiple HDFS agents (separate machines)


Copy link to this message
-
Re: Writing to HDFS from multiple HDFS agents (separate machines)
Gary Malouf 2013-03-15, 02:42
Thanks for the pointer Mike.  Any thoughts on how you choose how many
consumers per channel?  I will eventually find the optimal number via perf
testing, but it would be good to start with a nice default.

Thanks,

Gary
On Thu, Mar 14, 2013 at 10:30 PM, Gary Malouf <[EMAIL PROTECTED]> wrote:

> Paul, I interpreted the host property to be for identifying the host that
> an event originates from rather than the host of the sink which writes the
> event to HDFS?  Is my understanding correct?
>
>
> What happens if I am using the NettyAvroRpcClient to feed events from a
> different server round robin style to two hdfs writing agents; should I
> then NOT set the host property on client side and rely on the interceptor?
>
>
> On Thu, Mar 14, 2013 at 6:34 PM, Gary Malouf <[EMAIL PROTECTED]>wrote:
>
>> To be clear, I am referring to the segregating of data from different
>> flume sinks as opposed to the original source of the event.  Having said
>> that, it sounds like your approach is the easiest.
>>
>> -Gary
>>
>>
>> On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf <[EMAIL PROTECTED]>wrote:
>>
>>> Hi guys,
>>>
>>> I'm new to flume (hdfs for that metter), using the version packaged with
>>> CDH4 (1.3.0) and was wondering how others are maintaining different file
>>> names being written to per HDFS sink.
>>>
>>> My initial thought is to create a separate sub-directory in hdfs for
>>> each sink - though I feel like the better way is to somehow prefix each
>>> file with a unique sink id.  Are there any patterns that others are
>>> following for this?
>>>
>>> -Gary
>>>
>>
>>
>