Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Best way for a newbie to learn flume


Copy link to this message
-
Re: Best way for a newbie to learn flume
Sandeep,

So Flume currently has two tracks:

Flume OG (not actively supported)
https://cwiki.apache.org/confluence/display/FLUME/Flume+OG+%28pre+1.0%29
Flume NG (currently active)
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG

The latest stable version for Flume NG is 1.3.1

The NG stands for Next Generation and it is the current active development
track.

The OG refers to the Original Generation of Flume. This includes releases
before the 1.0.0 release.

New comers and existing users of the OG track are encouraged to migrate
over to the NG track.

You can download Flume NG 1.3.1 here

http://flume.apache.org/download.html

Regarding "Getting Started", in the next couple of weeks, additional
information will be added to Wiki to make the on-boarding process easier
for new comers.

In the time being, please bear with us.

I would recommend you download and install the latest version of Java 1.6.

Then download Flume and extract it to folder in your directory.

Then you can use the following sources, channels and sinks to get started.

This is the best way for you to learn and understand the various pieces.

SOURCE: Spooling Directory Source
CHANNEL: File Channel (more reliable) or Memory Channel (faster)
SINK: File Roll Sink

You can create a directory that you will be spooling and dump a couple of
log files in there. Make sure the files a new-line delimited.

Each line will represent an event in the log files.

Then configure the file channel and the file roll sink using guidelines and
examples available in the user guide.

http://flume.apache.org/FlumeUserGuide.html

That will give you a feel for how flume works.

Once you have that set up then you can run the agent and see what happens.

Once you start getting the hang of things you can try other sources and
sinks or maybe even create a few of your own custom sources, channels or
sinks.

On 7 April 2013 17:10, Sandeep Baldawa <[EMAIL PROTECTED]> wrote:

>
> Thanks for the detailed reply.
>
> Awesome questions and I should have added these details in my question, am
> learning flume more as a hobby, learning experience and a tech
> enthusiastic(heard pretty good things about flume).
>
> Thanks again for the instructions. Just one question about setting things
> up, are instructions at http://archive.cloudera.com/cdh/3/flume/UserGuide/relevant with the latest build?, I liked the documentation in this link
> which has a quick start guide too.
>
>
> On Sun, Apr 7, 2013 at 1:28 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:
>
>> Sandeep,
>>
>> Excellent questions.
>>
>> You asked "what problem Flume is trying to solve?".
>>
>> It think the more appropriate question is what problem you are trying to
>> solve?
>>
>> This will go a long way in helping us understand which components of
>> Flume you may need and how you need to set it up.
>>
>> Are you using Flume as part of your job or personal hubby? Are you using
>> Flume for a course at school or part of an academic project?
>>
>> Going back to your original question, in the simplest terms, and for most
>> use cases, Flume is a system designed for collecting and transporting large
>> amounts of data and events from one or more sources and then aggregating
>> the collected data in a centralized data store or for onward propagation to
>> subsequent sources.
>>
>> You can use it for aggregating data from log files, network traffic,
>> click streams, twitter and any other source that can generate events.
>>
>> Spend more time to review the user guide and you will find a lot of
>> information and answers to prospective questions.
>>
>> http://flume.apache.org/FlumeUserGuide.html
>>
>> To install flume you will need to set up Java 1.6 and then make sure that
>> it is available in your PATH and then download the latest version of Flume
>> and decompress the tarball or zip file.
>>
>> You will need to set up the configuration file(s) for the agents based on
>> the sources, channels and sinks you choose to use.
>>
>> I would recommend that you go ahead and get started with setting it up