Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to mapreduce in the scenario


Copy link to this message
-
Re: How to mapreduce in the scenario
Nitin Pawar 2012-05-30, 03:49
if you have huge dataset (huge meaning that around tera bytes or at the
least few GBs) then yes, hadoop has the advantage of distributed systems
and is much faster

but on a smaller set of records it is not as good as RDBMS

On Wed, May 30, 2012 at 6:53 AM, liuzhg <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Mike, Nitin, Devaraj, Soumya, samir, Robert
>
> Thank you all for your suggestions.
>
> Actually, I want to know if hadoop has any advantage than routine database
> in performance for solving this kind of problem ( join data ).
>
>
>
> Best Regards,
>
> Gump
>
>
>
>
>
> On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee
> <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> You can also try to use the Hadoop Reduce Side Join functionality.
> Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
> Reduce classes to do the same.
>
> Regards,
> Soumya.
>
>
> On Tue, May 29, 2012 at 4:10 PM, Devaraj k <[EMAIL PROTECTED]> wrote:
>
> > Hi Gump,
> >
> >   Mapreduce fits well for solving these types(joins) of problem.
> >
> > I hope this will help you to solve the described problem..
> >
> > 1. Mapoutput key and value classes : Write a map out put key
> > class(Text.class), value class(CombinedValue.class). Here value class
> > should be able to hold the values from both the files(a.txt and b.txt) as
> > shown below.
> >
> > class CombinedValue implements WritableComparator
> > {
> >   String name;
> >   int age;
> >   String address;
> >   boolean isLeft; // flag to identify from which file
> > }
> >
> > 2. Mapper : Write a map() function which can parse from both the
> > files(a.txt, b.txt) and produces common output key and value class.
> >
> > 3. Partitioner : Write the partitioner in such a way that it will Send
> all
> > the (key, value) pairs to same reducer which are having same key.
> >
> > 4. Reducer : In the reduce() function, you will receive the records from
> > both the files and you can combine those easily.
> >
> >
> > Thanks
> > Devaraj
> >
> >
> > ________________________________________
> > From: liuzhg [[EMAIL PROTECTED]]
> > Sent: Tuesday, May 29, 2012 3:45 PM
> > To: [EMAIL PROTECTED]
> > Subject: How to mapreduce in the scenario
> >
> > Hi,
> >
> > I wonder that if Hadoop can solve effectively the question as following:
> >
> > =========================================> > input file: a.txt, b.txt
> > result: c.txt
> >
> > a.txt:
> > id1,name1,age1,...
> > id2,name2,age2,...
> > id3,name3,age3,...
> > id4,name4,age4,...
> >
> > b.txt:
> > id1,address1,...
> > id2,address2,...
> > id3,address3,...
> >
> > c.txt
> > id1,name1,age1,address1,...
> > id2,name2,age2,address2,...
> > =======================================> >
> > I know that it can be done well by database.
> > But I want to handle it with hadoop if possible.
> > Can hadoop meet the requirement?
> >
> > Any suggestion can help me. Thank you very much!
> >
> > Best Regards,
> >
> > Gump
> >
>
>
>
>
--
Nitin Pawar