Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Does Flume NG requires to be installed on all the sources?


+
Seshu V 2013-02-07, 00:49
+
Jeff Lord 2013-02-07, 01:47
+
Seshu V 2013-02-07, 18:07
Copy link to this message
-
Re: Does Flume NG requires to be installed on all the sources?
Jeff Lord 2013-02-08, 22:42
It sounds like you might want to use the log4j appender than.
This will only require the flume-ng sdk libs on the servers where the log
messages originate.
http://flume.apache.org/FlumeUserGuide.html#log4j-appender

as well as a flume-ng agent to send the events to.

You do Not need to install flume on the hdfs namenode.
In order for the hdfs sink to write to hdfs you simply need to configure
the sink properly with the hdfs write path.
e.g.
hdfs.path hdfs://namenode/flume/webdata/

Please have a read over the flume user guide as their is a lot of good info
in there that you may find useful.
http://flume.apache.org/FlumeUserGuide.html

One example flow could be:

AppServer configured with log4j appender --> flume agent [source (avro) |
channel | sink (hdfs) --> hdfs

On Thu, Feb 7, 2013 at 10:07 AM, Seshu V <[EMAIL PROTECTED]> wrote:

> Hello Jeff,
>
>    Thanks for the reply.  My use case is not really special.  We have
> multiple products and each product emits traditional log messages in
> different servers.  I would like to stream those into HDFS.  The logs are
> generally in apache or log4j format.
>    So, I have many sources from where I want to stream the logs into HDFS.
>   I can have a channel/collector machine where I install flume.   I guess,
> my question is, do I need to install flume on the servers where the log
> messages lie and do I need to install flume in HDFS namenode too?
>
> Thanks,
> - Seshu
>
>
> On Wed, Feb 6, 2013 at 7:47 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>
>> Seshu,
>>
>> It really is going to depend on your use case.
>> Though it sounds that you may need to run an agent on each of the source
>> machines.
>> Which source do you plan to use? It may also be the case that you can use
>> the flume rpc client to write data directly from your application to the
>> flume collector machine.
>>
>> http://flume.apache.org/FlumeDeveloperGuide.html#rpc-client-interface
>>
>> -Jeff
>>
>>
>> On Wed, Feb 6, 2013 at 4:49 PM, Seshu V <[EMAIL PROTECTED]> wrote:
>>
>>> Hi All,
>>>
>>>     I have used Flume 0.9.3 a while back, it worked fine at that time.
>>>  Now, I am looking to use 'Flume NG', started reading documentation today.
>>>  In Flume 0.9.3, I installed flume agents on the servers wherever I had the
>>> data source.   And, I had a collector machine separately.  My sink was
>>> HDFS.   I see that Flume NG is using Channel.
>>>     My question is that I have multiple source servers and my sink is
>>> HDFS.  I also have another machine for Channel (collector in old days).
>>> Do I need to install flume NG  in all the source machines and Channel
>>> machine?  Or can I install flume NG only on the Channel server and
>>> (somehow) specify in the configuration to pull data from source machines
>>> and specify the sink as HDFS?
>>>      Thanks in advance for your replies..
>>>
>>> Thanks,
>>> - Seshu
>>>
>>>
>>
>>
>