Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to mapreduce in the scenario


Copy link to this message
-
Re: How to mapreduce in the scenario
samir das mohapatra 2012-05-30, 13:32
Yes . Hadoop Is only for Huge Dataset Computaion .
  May not good for small dataset.

On Wed, May 30, 2012 at 6:53 AM, liuzhg <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Mike, Nitin, Devaraj, Soumya, samir, Robert
>
> Thank you all for your suggestions.
>
> Actually, I want to know if hadoop has any advantage than routine database
> in performance for solving this kind of problem ( join data ).
>
>
>
> Best Regards,
>
> Gump
>
>
>
>
>
> On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee
> <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> You can also try to use the Hadoop Reduce Side Join functionality.
> Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
> Reduce classes to do the same.
>
> Regards,
> Soumya.
>
>
> On Tue, May 29, 2012 at 4:10 PM, Devaraj k <[EMAIL PROTECTED]> wrote:
>
> > Hi Gump,
> >
> >   Mapreduce fits well for solving these types(joins) of problem.
> >
> > I hope this will help you to solve the described problem..
> >
> > 1. Mapoutput key and value classes : Write a map out put key
> > class(Text.class), value class(CombinedValue.class). Here value class
> > should be able to hold the values from both the files(a.txt and b.txt) as
> > shown below.
> >
> > class CombinedValue implements WritableComparator
> > {
> >   String name;
> >   int age;
> >   String address;
> >   boolean isLeft; // flag to identify from which file
> > }
> >
> > 2. Mapper : Write a map() function which can parse from both the
> > files(a.txt, b.txt) and produces common output key and value class.
> >
> > 3. Partitioner : Write the partitioner in such a way that it will Send
> all
> > the (key, value) pairs to same reducer which are having same key.
> >
> > 4. Reducer : In the reduce() function, you will receive the records from
> > both the files and you can combine those easily.
> >
> >
> > Thanks
> > Devaraj
> >
> >
> > ________________________________________
> > From: liuzhg [[EMAIL PROTECTED]]
> > Sent: Tuesday, May 29, 2012 3:45 PM
> > To: [EMAIL PROTECTED]
> > Subject: How to mapreduce in the scenario
> >
> > Hi,
> >
> > I wonder that if Hadoop can solve effectively the question as following:
> >
> > =========================================> > input file: a.txt, b.txt
> > result: c.txt
> >
> > a.txt:
> > id1,name1,age1,...
> > id2,name2,age2,...
> > id3,name3,age3,...
> > id4,name4,age4,...
> >
> > b.txt:
> > id1,address1,...
> > id2,address2,...
> > id3,address3,...
> >
> > c.txt
> > id1,name1,age1,address1,...
> > id2,name2,age2,address2,...
> > =======================================> >
> > I know that it can be done well by database.
> > But I want to handle it with hadoop if possible.
> > Can hadoop meet the requirement?
> >
> > Any suggestion can help me. Thank you very much!
> >
> > Best Regards,
> >
> > Gump
> >
>
>
>
>