|
|
-
the same key in different reducers
Oleg Ruchovets 2010-06-09, 08:17
Hi , My hadoop job writes results of map/reduce to HBase. I have 3 reducers.
Here is a sequence of input and output parameters for Mapper , Combiner and Reducer *input: InputFormat<K1,V1> mapper: Mapper<K1,V1,K2,V2> combiner: Reducer<K2,V2,K2,V2> reducer: Reducer<K2,V2,K3,V3> output: RecordWriter<K3,V3>
*My question: Is it possible that more than one reducer has the same output key K3. Meaning in case I have 3 reducers is it possible that reducer1 K3 -* 1* , V3 [1,2,3] reducer2 K3 - 2 , V3 [5,6,9] reducer3 K3 - *1* , V3 [10,15,22]
As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. So is that case possible or every and every reducer has unique output key?
Thanks in advance Oleg.
-
Re: the same key in different reducers
Ted Yu 2010-06-09, 21:06
Can you disclose more about how K3 is generated. >From your description below, it is possible.
On Wed, Jun 9, 2010 at 1:17 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote:
> Hi , > My hadoop job writes results of map/reduce to HBase. > I have 3 reducers. > > Here is a sequence of input and output parameters for Mapper , Combiner and > Reducer > *input: InputFormat<K1,V1> > mapper: Mapper<K1,V1,K2,V2> > combiner: Reducer<K2,V2,K2,V2> > reducer: Reducer<K2,V2,K3,V3> > output: RecordWriter<K3,V3> > > *My question: > Is it possible that more than one reducer has the same output key K3. > Meaning in case I have 3 reducers is it possible that > reducer1 K3 -* 1* , V3 [1,2,3] > reducer2 K3 - 2 , V3 [5,6,9] > reducer3 K3 - *1* , V3 [10,15,22] > > As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. > So is that case possible or every and every reducer has unique output key? > > Thanks in advance > Oleg. >
-
Re: the same key in different reducers
Owen O'Malley 2010-06-09, 21:22
On Jun 9, 2010, at 1:17 AM, Oleg Ruchovets wrote:
> So is that case possible or every and every reducer has unique > output key?
The partitioner controls which reduce a given key is sent to. If the partitioner is non-deterministic, the key can end up going to different reduces. If you are using the default hash partitioner, that would imply that you didn't define a proper hash code for your key.
-- Owen
-
Re: the same key in different reducers
James Seigel 2010-06-09, 21:40
Oleg,
Are you wanting to have them in different reducers? If so then you can write a Comparable object to make that happen.
If you want them to be on the same reducer, then that is what hadoop will do.
:) On 2010-06-09, at 3:06 PM, Ted Yu wrote:
> Can you disclose more about how K3 is generated. > From your description below, it is possible. > > On Wed, Jun 9, 2010 at 1:17 AM, Oleg Ruchovets <[EMAIL PROTECTED]> wrote: > >> Hi , >> My hadoop job writes results of map/reduce to HBase. >> I have 3 reducers. >> >> Here is a sequence of input and output parameters for Mapper , Combiner and >> Reducer >> *input: InputFormat<K1,V1> >> mapper: Mapper<K1,V1,K2,V2> >> combiner: Reducer<K2,V2,K2,V2> >> reducer: Reducer<K2,V2,K3,V3> >> output: RecordWriter<K3,V3> >> >> *My question: >> Is it possible that more than one reducer has the same output key K3. >> Meaning in case I have 3 reducers is it possible that >> reducer1 K3 -* 1* , V3 [1,2,3] >> reducer2 K3 - 2 , V3 [5,6,9] >> reducer3 K3 - *1* , V3 [10,15,22] >> >> As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. >> So is that case possible or every and every reducer has unique output key? >> >> Thanks in advance >> Oleg. >>
-
Re: the same key in different reducers
Ted Yu 2010-06-09, 21:43
I think his question was about the output key from reducer.
On Wed, Jun 9, 2010 at 2:40 PM, James Seigel <[EMAIL PROTECTED]> wrote:
> Oleg, > > Are you wanting to have them in different reducers? If so then you can > write a Comparable object to make that happen. > > If you want them to be on the same reducer, then that is what hadoop will > do. > > :) > > > On 2010-06-09, at 3:06 PM, Ted Yu wrote: > > > Can you disclose more about how K3 is generated. > > From your description below, it is possible. > > > > On Wed, Jun 9, 2010 at 1:17 AM, Oleg Ruchovets <[EMAIL PROTECTED]> > wrote: > > > >> Hi , > >> My hadoop job writes results of map/reduce to HBase. > >> I have 3 reducers. > >> > >> Here is a sequence of input and output parameters for Mapper , Combiner > and > >> Reducer > >> *input: InputFormat<K1,V1> > >> mapper: Mapper<K1,V1,K2,V2> > >> combiner: Reducer<K2,V2,K2,V2> > >> reducer: Reducer<K2,V2,K3,V3> > >> output: RecordWriter<K3,V3> > >> > >> *My question: > >> Is it possible that more than one reducer has the same output key K3. > >> Meaning in case I have 3 reducers is it possible that > >> reducer1 K3 -* 1* , V3 [1,2,3] > >> reducer2 K3 - 2 , V3 [5,6,9] > >> reducer3 K3 - *1* , V3 [10,15,22] > >> > >> As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. > >> So is that case possible or every and every reducer has unique output > key? > >> > >> Thanks in advance > >> Oleg. > >> > >
-
Re: the same key in different reducers
Alex Kozlov 2010-06-09, 22:15
I think the question was about the mapper keys and their assignment to reducers. The Parititioner<K,V> API looks like:
public abstract int getPartition(KEY key, VALUE value, int numPartitions);
So I assume it is entirely possible to write a partitioner that distributes the same key to multiple reducers and it does not have to be non-deterministic. It can assign the partition based on the value.
Is this correct?
On Wed, Jun 9, 2010 at 2:43 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> I think his question was about the output key from reducer. > > On Wed, Jun 9, 2010 at 2:40 PM, James Seigel <[EMAIL PROTECTED]> wrote: > > > Oleg, > > > > Are you wanting to have them in different reducers? If so then you can > > write a Comparable object to make that happen. > > > > If you want them to be on the same reducer, then that is what hadoop will > > do. > > > > :) > > > > > > On 2010-06-09, at 3:06 PM, Ted Yu wrote: > > > > > Can you disclose more about how K3 is generated. > > > From your description below, it is possible. > > > > > > On Wed, Jun 9, 2010 at 1:17 AM, Oleg Ruchovets <[EMAIL PROTECTED]> > > wrote: > > > > > >> Hi , > > >> My hadoop job writes results of map/reduce to HBase. > > >> I have 3 reducers. > > >> > > >> Here is a sequence of input and output parameters for Mapper , > Combiner > > and > > >> Reducer > > >> *input: InputFormat<K1,V1> > > >> mapper: Mapper<K1,V1,K2,V2> > > >> combiner: Reducer<K2,V2,K2,V2> > > >> reducer: Reducer<K2,V2,K3,V3> > > >> output: RecordWriter<K3,V3> > > >> > > >> *My question: > > >> Is it possible that more than one reducer has the same output key K3. > > >> Meaning in case I have 3 reducers is it possible that > > >> reducer1 K3 -* 1* , V3 [1,2,3] > > >> reducer2 K3 - 2 , V3 [5,6,9] > > >> reducer3 K3 - *1* , V3 [10,15,22] > > >> > > >> As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. > > >> So is that case possible or every and every reducer has unique output > > key? > > >> > > >> Thanks in advance > > >> Oleg. > > >> > > > > >
-
Re: the same key in different reducers
Owen O'Malley 2010-06-10, 02:30
On Wed, Jun 9, 2010 at 3:15 PM, Alex Kozlov <[EMAIL PROTECTED]> wrote: > So I assume it is entirely possible to write a partitioner that distributes > the same key to multiple reducers and it does not have to be > non-deterministic. It can assign the partition based on the value. > > Is this correct?
Yes. I've never liked the fact that Partitioners get the value for exactly that reason. It was originally put in for some obscure corner case in Nutch. Fixing it now would be difficult.
Also note that "non-deterministic" doesn't imply using Random. You could just fail to overload the hashcode method and take the default from Object. That would cause you to hash based on the object's address, which is different for each jvm.
-- Owen
-
Re: the same key in different reducers
Oleg Ruchovets 2010-06-10, 16:27
Hi and thank you for the answers. I didn't check the email and now I see 7 answers. It is really great.
Let me explain in more details why I am asking so strange question :-)
As I wrote before I write to HBase using Hadoop Job. Actually the writing process executes in reducers part of HADOOP job. Assuming that I have 3 reducers (all of them writes to HBase) and suppose 1 reducer and 3 reducer has the same key. In this case I need to check: does HBase already contains such key ( it required select operation from HBase). If yes I have to merge already inserted record and after that writes it back to HBase. BUT in my case information organized in such way that I have no problem with the same keys. So I can save expensive HBase select operation , meaning using only insert operations. But in order to use only insert operation I need to know that every and every reducer have unique output key ( K3 is unique output key for every and every reducer)
input: InputFormat<K1,V1> mapper: Mapper<K1,V1,K2,V2> combiner: Reducer<K2,V2,K2,V2> reducer: Reducer<K2,V2,K3,V3> output: RecordWriter<K3,V3>
On Thu, Jun 10, 2010 at 12:40 AM, James Seigel <[EMAIL PROTECTED]> wrote:
> Oleg, > > Are you wanting to have them in different reducers? If so then you can > write a Comparable object to make that happen. > > If you want them to be on the same reducer, then that is what hadoop will > do. > > :) > > > On 2010-06-09, at 3:06 PM, Ted Yu wrote: > > > Can you disclose more about how K3 is generated. > > From your description below, it is possible. > > > > On Wed, Jun 9, 2010 at 1:17 AM, Oleg Ruchovets <[EMAIL PROTECTED]> > wrote: > > > >> Hi , > >> My hadoop job writes results of map/reduce to HBase. > >> I have 3 reducers. > >> > >> Here is a sequence of input and output parameters for Mapper , Combiner > and > >> Reducer > >> *input: InputFormat<K1,V1> > >> mapper: Mapper<K1,V1,K2,V2> > >> combiner: Reducer<K2,V2,K2,V2> > >> reducer: Reducer<K2,V2,K3,V3> > >> output: RecordWriter<K3,V3> > >> > >> *My question: > >> Is it possible that more than one reducer has the same output key K3. > >> Meaning in case I have 3 reducers is it possible that > >> reducer1 K3 -* 1* , V3 [1,2,3] > >> reducer2 K3 - 2 , V3 [5,6,9] > >> reducer3 K3 - *1* , V3 [10,15,22] > >> > >> As you can see reducer1 has K3 - 1 and reducer3 has K3 - 1. > >> So is that case possible or every and every reducer has unique output > key? > >> > >> Thanks in advance > >> Oleg. > >> > >
|
|