Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Writing to HDFS from multiple HDFS agents (separate machines)

Copy link to this message
Re: Writing to HDFS from multiple HDFS agents (separate machines)
I could differentiate different sources using this config by creating
separate directories by hostname:

agent.sources.syslogsrc.interceptors = ts
agent.sources.syslogsrc.interceptors.ts.type = timestamp
agent.sinks.hdfsSink.hdfs.path hdfs://<ip_addr>:<port>/flumetest/%{host}/%y-%m-%d

However, I have a question related to this.  When two different products
are sending their logs to one source and I am collecting them via syslog.
 Is there a way to differentiate two different product logs coming from
single source in flume?  I would ideally like to have sub directory at the
sink like '/flumetest/%{host}/<product_name>/%y-%m-%d.  How can I do this?

- Seshu
On Thu, Mar 14, 2013 at 5:00 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello sir,
>     One idea could be to create the sub directories with the machines'
> hostnames, in case you are getting data from multiple sources. you can
> easily find out which data belongs to which machine then.
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
> On Fri, Mar 15, 2013 at 3:24 AM, Gary Malouf <[EMAIL PROTECTED]>wrote:
>> Hi guys,
>> I'm new to flume (hdfs for that metter), using the version packaged with
>> CDH4 (1.3.0) and was wondering how others are maintaining different file
>> names being written to per HDFS sink.
>> My initial thought is to create a separate sub-directory in hdfs for each
>> sink - though I feel like the better way is to somehow prefix each file
>> with a unique sink id.  Are there any patterns that others are following
>> for this?
>> -Gary