Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: [Hadoop-Help]About Map-Reduce implementation


Copy link to this message
-
Re: [Hadoop-Help]About Map-Reduce implementation
Hello,

  Thank you sir for your favorable reply.

  I am going to use 1master and 2 worker

  nodes ; totally 3 nodes.

  Thank you !!

*--
Cheers,
Mayur
*
On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> wrote:

> Hi Mayur,
>
> Those 3 modes are 3 differents ways to use Hadoop, however, the only
> production mode here is the fully distributed one. The 2 others are
> more for local testing. How many nodes are you expecting to use hadoop
> on?
>
> JM
>
>
> 2013/3/7 Mayur Patil <[EMAIL PROTECTED]>:
> > Hello,
> >
> >    Now I am slowly understanding Hadoop working.
> >
> >   As I want to collect the logs from three machines
> >
> >   including Master itself . My small query is
> >
> >   which mode should I implement for this??
> >
> >                   Standalone Operation
> >                   Pseudo-Distributed Operation
> >                   Fully-Distributed Operation
> >
> >      Seeking for guidance,
> >
> >      Thank you !!
> > --
> > Cheers,
> > Mayur
> >
> >
> >
> >
> >>> Hi mayur,
> >>>
> >>> Flume is used for data collection. Pig is used for data processing.
> >>> For eg, if you have a bunch of servers that you want to collect the
> >>> logs from and push to HDFS - you would use flume. Now if you need to
> >>> run some analysis on that data, you could use pig to do that.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> >   I just read about Pig
> >>> >
> >>> >> Pig
> >>> >> A data flow language and execution environment for exploring very
> >>> > large datasets.
> >>> >> Pig runs on HDFS and MapReduce clusters.
> >>> >
> >>> >   What the actual difference between Pig and Flume makes in logs
> >>> > clustering??
> >>> >
> >>> >   Thank you !!
> >>> > --
> >>> > Cheers,
> >>> > Mayur.
> >>> >
> >>> >
> >>> >
> >>> >> Hey Mayur,
> >>> >>>
> >>> >>> If you are collecting logs from multiple servers then you can use
> >>> >>> flume
> >>> >>> for the same.
> >>> >>>
> >>> >>> if the contents of the logs are different in format  then you can
> >>> >>> just
> >>> >>> use
> >>> >>> textfileinput format to read and write into any other format you
> want
> >>> >>> for
> >>> >>> your processing in later part of your projects
> >>> >>>
> >>> >>> first thing you need to learn is how to setup hadoop
> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
> >>> >>> text
> >>> >>> file and then process them and write the results into another file
> >>> >>> then you can integrate flume as your log collection mechanism
> >>> >>> once you get hold on the system then you can decide more on which
> >>> >>> paths
> >>> >>> you want to follow based on your requirements for storage, compute
> >>> >>> time,
> >>> >>> compute capacity, compression etc
> >>> >>>
> >>> >> --------------
> >>> >> --------------
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> Please read basics on how hadoop works.
> >>> >>>
> >>> >>> Then start your hands on with map reduce coding.
> >>> >>>
> >>> >>> The tool which has been made for you is flume , but don't see tool
> >>> >>> till
> >>> >>> you complete above two steps.
> >>> >>>
> >>> >>> Good luck , keep us posted.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Jagat Singh
> >>> >>>
> >>> >>> -----------
> >>> >>> Sent from Mobile , short and crisp.
> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Hello,
> >>> >>>>
> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>> >>>>
> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>> >>>>
> >>> >>>>    to collect logs from 2-3 machines having different locations.
> >>> >>>>
> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>> >>>>
> >>> >>>>    Later, I have to collect and convert them to one format and
> >>> >>>>
> >>> >>>>    collect to one location.
> >>> >
*Cheers,
Mayur*.