Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Modifying Hadoop For join Operation


+
Vikas Jadhav 2013-01-24, 13:21
+
Harsh J 2013-01-24, 15:09
+
Praveen Sripati 2013-01-24, 16:52
Copy link to this message
-
Re: Modifying Hadoop For join Operation
HI Thanks @ Harsh for replying

I am attaching paper called Map-Join-Reduce

I want to implement similar kind of architecture.

Currently MapReduce Proccess join job using Map or reduce Side join

For Reduce Side join job it has drawback

 -->for large datasets there is lot of traffic(data movenment) from
      mapper to reduces(one option We can filter out record using
      Bloloom   Filter like technique)

 FOR THIS I WANT TO PROCESS ALL JOIN IN SINGLE  MAPREDUCE JOB
1) MAP PHASE- processes all datasets and filter out record
2) REDUCE PHASE -
   reduce phase divided in to join and reducer

   join - joins all datasets
   reducer - does aggregation

   for R join S join T
                                    Reduce
mapR
mapS   -----> mapR join mapS => RS   =>RST  --> Reducer(aggrgation)
mapT-------------------------------------->mapT

If you have any idea plze share it.

any other suggestion also we welcome if it reduces completion time for
joining large dataset
thank you

**
On Thu, Jan 24, 2013 at 8:39 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Can you also define 'efficient way' and the idea you have in mind to
> implement that isn't already doable today?
>
> On Thu, Jan 24, 2013 at 6:51 PM, Vikas Jadhav <[EMAIL PROTECTED]>
> wrote:
> > Anyone has idea about how should i modify Hadoop Code for
> > Performing Join operation in efficient Way.
> > Thanks.
> >
> > --
> >
> >
> > Thanx and Regards
> >  Vikas Jadhav
>
>
>
> --
> Harsh J
>

--
*
*
*

Thanx and Regards*
* Vikas Jadhav*