Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Best way for a newbie to learn flume


+
Sandeep Baldawa 2013-04-07, 18:45
+
Israel Ekpo 2013-04-07, 20:28
+
Sandeep Baldawa 2013-04-07, 21:10
+
Israel Ekpo 2013-04-07, 21:54
Copy link to this message
-
Re: Best way for a newbie to learn flume
Israel Ekpo 2013-04-07, 21:58
Also the link below is associated with Flume OG and not Flume NG

http://archive.cloudera.com/cdh/3/flume/UserGuide/

The architecture and features have changed significantly since that version
On 7 April 2013 17:54, Israel Ekpo <[EMAIL PROTECTED]> wrote:

> Sandeep,
>
> So Flume currently has two tracks:
>
> Flume OG (not actively supported)
> https://cwiki.apache.org/confluence/display/FLUME/Flume+OG+%28pre+1.0%29
> Flume NG (currently active)
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
>
> The latest stable version for Flume NG is 1.3.1
>
> The NG stands for Next Generation and it is the current active development
> track.
>
> The OG refers to the Original Generation of Flume. This includes releases
> before the 1.0.0 release.
>
> New comers and existing users of the OG track are encouraged to migrate
> over to the NG track.
>
> You can download Flume NG 1.3.1 here
>
> http://flume.apache.org/download.html
>
> Regarding "Getting Started", in the next couple of weeks, additional
> information will be added to Wiki to make the on-boarding process easier
> for new comers.
>
> In the time being, please bear with us.
>
> I would recommend you download and install the latest version of Java 1.6.
>
> Then download Flume and extract it to folder in your directory.
>
> Then you can use the following sources, channels and sinks to get started.
>
> This is the best way for you to learn and understand the various pieces.
>
> SOURCE: Spooling Directory Source
> CHANNEL: File Channel (more reliable) or Memory Channel (faster)
> SINK: File Roll Sink
>
> You can create a directory that you will be spooling and dump a couple of
> log files in there. Make sure the files a new-line delimited.
>
> Each line will represent an event in the log files.
>
> Then configure the file channel and the file roll sink using guidelines
> and examples available in the user guide.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> That will give you a feel for how flume works.
>
> Once you have that set up then you can run the agent and see what happens.
>
> Once you start getting the hang of things you can try other sources and
> sinks or maybe even create a few of your own custom sources, channels or
> sinks.
>
>
>
> On 7 April 2013 17:10, Sandeep Baldawa <[EMAIL PROTECTED]> wrote:
>
>>
>> Thanks for the detailed reply.
>>
>> Awesome questions and I should have added these details in my question,
>> am learning flume more as a hobby, learning experience and a tech
>> enthusiastic(heard pretty good things about flume).
>>
>> Thanks again for the instructions. Just one question about setting things
>> up, are instructions at
>> http://archive.cloudera.com/cdh/3/flume/UserGuide/ relevant with the
>> latest build?, I liked the documentation in this link which has a quick
>> start guide too.
>>
>>
>> On Sun, Apr 7, 2013 at 1:28 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:
>>
>>> Sandeep,
>>>
>>> Excellent questions.
>>>
>>> You asked "what problem Flume is trying to solve?".
>>>
>>> It think the more appropriate question is what problem you are trying to
>>> solve?
>>>
>>> This will go a long way in helping us understand which components of
>>> Flume you may need and how you need to set it up.
>>>
>>> Are you using Flume as part of your job or personal hubby? Are you using
>>> Flume for a course at school or part of an academic project?
>>>
>>> Going back to your original question, in the simplest terms, and for
>>> most use cases, Flume is a system designed for collecting and transporting
>>> large amounts of data and events from one or more sources and then
>>> aggregating the collected data in a centralized data store or for onward
>>> propagation to subsequent sources.
>>>
>>> You can use it for aggregating data from log files, network traffic,
>>> click streams, twitter and any other source that can generate events.
>>>
>>> Spend more time to review the user guide and you will find a lot of
>>> information and answers to prospective questions.
+
Sandeep Baldawa 2013-04-07, 22:14