Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Question on Key Grouping


Copy link to this message
-
Question on Key Grouping
Is there a way to group Keys a second time before sending results to the
Reducer in the same job? I thought maybe a combiner would do this for me,
but it just acts like a reducer, so I need an intermediate step that acts
like another mapper instead.

To try to visualize this, how I want it to work:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, [{1, "John",""},{1, "",""},{1, "", "Doe"}]>

Reduce Output:

<1, "John","Doe">
How it currently works:

Map output:

<1, [{2, "John",""},{1, "",""},{1, "", "Doe"}]>

Combiner Output:

<1, {1, "John",""}>
<1, {1, "",""}>
<1, {1, "", "Doe"}>

Reduce Output:

<1, "John","Doe">
<1, "John","Doe">
<1, "John","Doe">
So, basically the issue is that even though the 2 in the first map record
should really be a one, I still need to extract the value of "John" and
have it included in the output for key 1.

Hope this makes sense.

Thanks in advance,
/* Joey */
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB