Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: How can I record some position of context in Reduce()?


Copy link to this message
-
Re: How can I record some position of context in Reduce()?
I wil express it in SQL form

select * from table1, table2 where table1.attr < table2.attr

it is also called theta join where theta can be <, >, <=,>=,!
On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Not sure what is meant by a non equi join.
>
> Are you saying something like for every row in X, join it to all of the
> rows in Y where Y.a < something?
>
> Is that what you are suggesting?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <[EMAIL PROTECTED]>
> wrote:
>
> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>
> *  Regards,*
> *  Vikas *
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be
>> seen.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <[EMAIL PROTECTED]> wrote:
>>
>> Only equality joins, outer joins, and left semi joins are supported in
>> Hive. Hive does not support join conditions that are not equality
>> conditions as it is very difficult to express such conditions as a
>> map/reduce job. Also, more than two tables can be joined in Hive.
>>
>>
>> 2013/4/9 Michael Segel <[EMAIL PROTECTED]>
>>
>>> Hi,
>>>
>>> Your cross join is supported in both pig and hive. (Cross, and Theta
>>> joins)
>>>
>>> So there must be code to do this.
>>>
>>> Essentially in the reducer you would have your key and then the set of
>>> rows that match the key. You would then perform the cross product on the
>>> key's result set and output them to the collector as separate rows.
>>>
>>> I'm not sure why you would need the reduce context.
>>>
>>> But then again, I'm still on my first cup of coffee. ;-)
>>>
>>>
>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table
>>> name or id
>>> then we dont need to find postion we can get Key and Value from
>>> "reducerContext"
>>>
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to
>>> reducer.join(reduceContext)
>>>
>>>
>>> I just wonder how r going to support NON EQUI join.
>>>
>>> I am also having same problem how to do join if datasets cant fit in to
>>> memory.
>>>
>>>
>>> for now i am cloning using following code :
>>>
>>>
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>>
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>
>>>
>>> if you have found any other solution please feel free to share
>>>
>>> Thank You.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <[EMAIL PROTECTED]> wrote:
>>>
>>>> In reduce() we have:
>>>>
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>>
>>>> so,what i want to do is join all values like a SQL:
>>>>
>>>> select * from values1,values2...valuesn;
>>>>
>>>> if memory is not enough to cache values,how to complete the join
*
*
*

  Regards,*
*   Vikas *