Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Best way for a newbie to learn flume


Copy link to this message
-
Re: Best way for a newbie to learn flume
Also the link below is associated with Flume OG and not Flume NG

http://archive.cloudera.com/cdh/3/flume/UserGuide/

The architecture and features have changed significantly since that version
On 7 April 2013 17:54, Israel Ekpo <[EMAIL PROTECTED]> wrote:

> Sandeep,
>
> So Flume currently has two tracks:
>
> Flume OG (not actively supported)
> https://cwiki.apache.org/confluence/display/FLUME/Flume+OG+%28pre+1.0%29
> Flume NG (currently active)
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
>
> The latest stable version for Flume NG is 1.3.1
>
> The NG stands for Next Generation and it is the current active development
> track.
>
> The OG refers to the Original Generation of Flume. This includes releases
> before the 1.0.0 release.
>
> New comers and existing users of the OG track are encouraged to migrate
> over to the NG track.
>
> You can download Flume NG 1.3.1 here
>
> http://flume.apache.org/download.html
>
> Regarding "Getting Started", in the next couple of weeks, additional
> information will be added to Wiki to make the on-boarding process easier
> for new comers.
>
> In the time being, please bear with us.
>
> I would recommend you download and install the latest version of Java 1.6.
>
> Then download Flume and extract it to folder in your directory.
>
> Then you can use the following sources, channels and sinks to get started.
>
> This is the best way for you to learn and understand the various pieces.
>
> SOURCE: Spooling Directory Source
> CHANNEL: File Channel (more reliable) or Memory Channel (faster)
> SINK: File Roll Sink
>
> You can create a directory that you will be spooling and dump a couple of
> log files in there. Make sure the files a new-line delimited.
>
> Each line will represent an event in the log files.
>
> Then configure the file channel and the file roll sink using guidelines
> and examples available in the user guide.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> That will give you a feel for how flume works.
>
> Once you have that set up then you can run the agent and see what happens.
>
> Once you start getting the hang of things you can try other sources and
> sinks or maybe even create a few of your own custom sources, channels or
> sinks.
>
>
>
> On 7 April 2013 17:10, Sandeep Baldawa <[EMAIL PROTECTED]> wrote:
>
>>
>> Thanks for the detailed reply.
>>
>> Awesome questions and I should have added these details in my question,
>> am learning flume more as a hobby, learning experience and a tech
>> enthusiastic(heard pretty good things about flume).
>>
>> Thanks again for the instructions. Just one question about setting things
>> up, are instructions at
>> http://archive.cloudera.com/cdh/3/flume/UserGuide/ relevant with the
>> latest build?, I liked the documentation in this link which has a quick
>> start guide too.
>>
>>
>> On Sun, Apr 7, 2013 at 1:28 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:
>>
>>> Sandeep,
>>>
>>> Excellent questions.
>>>
>>> You asked "what problem Flume is trying to solve?".
>>>
>>> It think the more appropriate question is what problem you are trying to
>>> solve?
>>>
>>> This will go a long way in helping us understand which components of
>>> Flume you may need and how you need to set it up.
>>>
>>> Are you using Flume as part of your job or personal hubby? Are you using
>>> Flume for a course at school or part of an academic project?
>>>
>>> Going back to your original question, in the simplest terms, and for
>>> most use cases, Flume is a system designed for collecting and transporting
>>> large amounts of data and events from one or more sources and then
>>> aggregating the collected data in a centralized data store or for onward
>>> propagation to subsequent sources.
>>>
>>> You can use it for aggregating data from log files, network traffic,
>>> click streams, twitter and any other source that can generate events.
>>>
>>> Spend more time to review the user guide and you will find a lot of
>>> information and answers to prospective questions.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB