Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Best way for a newbie to learn flume


+
Sandeep Baldawa 2013-04-07, 18:45
+
Israel Ekpo 2013-04-07, 20:28
+
Sandeep Baldawa 2013-04-07, 21:10
+
Israel Ekpo 2013-04-07, 21:54
+
Israel Ekpo 2013-04-07, 21:58
Copy link to this message
-
Re: Best way for a newbie to learn flume
Thanks again for all the details. I will follow the steps described by you.
On Sun, Apr 7, 2013 at 2:58 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:

> Also the link below is associated with Flume OG and not Flume NG
>
> http://archive.cloudera.com/cdh/3/flume/UserGuide/
>
> The architecture and features have changed significantly since that version
>
>
> On 7 April 2013 17:54, Israel Ekpo <[EMAIL PROTECTED]> wrote:
>
>> Sandeep,
>>
>> So Flume currently has two tracks:
>>
>> Flume OG (not actively supported)
>> https://cwiki.apache.org/confluence/display/FLUME/Flume+OG+%28pre+1.0%29
>> Flume NG (currently active)
>> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
>>
>> The latest stable version for Flume NG is 1.3.1
>>
>> The NG stands for Next Generation and it is the current active
>> development track.
>>
>> The OG refers to the Original Generation of Flume. This includes releases
>> before the 1.0.0 release.
>>
>> New comers and existing users of the OG track are encouraged to migrate
>> over to the NG track.
>>
>> You can download Flume NG 1.3.1 here
>>
>> http://flume.apache.org/download.html
>>
>> Regarding "Getting Started", in the next couple of weeks, additional
>> information will be added to Wiki to make the on-boarding process easier
>> for new comers.
>>
>> In the time being, please bear with us.
>>
>> I would recommend you download and install the latest version of Java 1.6.
>>
>> Then download Flume and extract it to folder in your directory.
>>
>> Then you can use the following sources, channels and sinks to get started.
>>
>> This is the best way for you to learn and understand the various pieces.
>>
>> SOURCE: Spooling Directory Source
>> CHANNEL: File Channel (more reliable) or Memory Channel (faster)
>> SINK: File Roll Sink
>>
>> You can create a directory that you will be spooling and dump a couple of
>> log files in there. Make sure the files a new-line delimited.
>>
>> Each line will represent an event in the log files.
>>
>> Then configure the file channel and the file roll sink using guidelines
>> and examples available in the user guide.
>>
>> http://flume.apache.org/FlumeUserGuide.html
>>
>> That will give you a feel for how flume works.
>>
>> Once you have that set up then you can run the agent and see what happens.
>>
>> Once you start getting the hang of things you can try other sources and
>> sinks or maybe even create a few of your own custom sources, channels or
>> sinks.
>>
>>
>>
>> On 7 April 2013 17:10, Sandeep Baldawa <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Thanks for the detailed reply.
>>>
>>> Awesome questions and I should have added these details in my question,
>>> am learning flume more as a hobby, learning experience and a tech
>>> enthusiastic(heard pretty good things about flume).
>>>
>>> Thanks again for the instructions. Just one question about setting
>>> things up, are instructions at
>>> http://archive.cloudera.com/cdh/3/flume/UserGuide/ relevant with the
>>> latest build?, I liked the documentation in this link which has a quick
>>> start guide too.
>>>
>>>
>>> On Sun, Apr 7, 2013 at 1:28 PM, Israel Ekpo <[EMAIL PROTECTED]> wrote:
>>>
>>>> Sandeep,
>>>>
>>>> Excellent questions.
>>>>
>>>> You asked "what problem Flume is trying to solve?".
>>>>
>>>> It think the more appropriate question is what problem you are trying
>>>> to solve?
>>>>
>>>> This will go a long way in helping us understand which components of
>>>> Flume you may need and how you need to set it up.
>>>>
>>>> Are you using Flume as part of your job or personal hubby? Are you
>>>> using Flume for a course at school or part of an academic project?
>>>>
>>>> Going back to your original question, in the simplest terms, and for
>>>> most use cases, Flume is a system designed for collecting and transporting
>>>> large amounts of data and events from one or more sources and then
>>>> aggregating the collected data in a centralized data store or for onward
>>>> propagation to subsequent sources.
>>>>
>>>