Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Question on Key Grouping


Copy link to this message
-
Question on Key Grouping
Joey Krabacher 2012-12-04, 23:37
Is there a way to group Keys a second time before sending results to the
Reducer in the same job? I thought maybe a combiner would do this for me,
but it just acts like a reducer, so I need an intermediate step that acts
like another mapper instead.

To try to visualize this, how I want it to work:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, [{1, "John",""},{1, "",""},{1, "", "Doe"}]>

Reduce Output:

<1, "John","Doe">
How it currently works:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, {1, "John",""}>
<1, {1, "",""}>
<1, {1, "", "Doe"}>

Reduce Output:

<1, "John","Doe">
<1, "John","Doe">
<1, "John","Doe">
So, basically the issue is that even though the 2 in the first map record
should really be a one, I still need to extract the value of "John" and
have it included in the output for key 1.

Hope this makes sense.

Thanks in advance,
/* Joey */