-Re: Modifying Hadoop For join Operation
Vikas Jadhav 2013-01-24, 19:11
HI Thanks @ Harsh for replying
I am attaching paper called Map-Join-Reduce
I want to implement similar kind of architecture.
Currently MapReduce Proccess join job using Map or reduce Side join
For Reduce Side join job it has drawback
-->for large datasets there is lot of traffic(data movenment) from
mapper to reduces(one option We can filter out record using
Bloloom Filter like technique)
FOR THIS I WANT TO PROCESS ALL JOIN IN SINGLE MAPREDUCE JOB
1) MAP PHASE- processes all datasets and filter out record
2) REDUCE PHASE -
reduce phase divided in to join and reducer
join - joins all datasets
reducer - does aggregation
for R join S join T
mapS -----> mapR join mapS => RS =>RST --> Reducer(aggrgation)
If you have any idea plze share it.
any other suggestion also we welcome if it reduces completion time for
joining large dataset
On Thu, Jan 24, 2013 at 8:39 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Can you also define 'efficient way' and the idea you have in mind to
> implement that isn't already doable today?
> On Thu, Jan 24, 2013 at 6:51 PM, Vikas Jadhav <[EMAIL PROTECTED]>
> > Anyone has idea about how should i modify Hadoop Code for
> > Performing Join operation in efficient Way.
> > Thanks.
> > --
> > Thanx and Regards
> > Vikas Jadhav
> Harsh J
Thanx and Regards*
* Vikas Jadhav*