|
|
-
setGroupingComparatorClass
Mark Kerzner 2011-11-01, 17:08
Hi, Hadoop experts,
I've written my custom GroupComparator, and I want to tell Hadoop about it.
Now, there is a call
job.setGroupingComparatorClass(),
but I only find it in mapreduce package of version 0.21. In prior versions, I see a similar call
conf.setOutputValueGroupingComparator(GroupComparator.class);
but it does not cause my GroupComparator to be being used.
So my question is, should I change the code to use the mapreduce package (not a problem, since Cloudera has it backported to the current distribution), or is there a different, simpler way?
Thank you. Sincerely, Mark
-
Re: setGroupingComparatorClass
Harsh J 2011-11-01, 18:32
Hey Mark,
What problem do you see when you use JobConf#setOutputValueGroupingComparator(…) when writing jobs with the stable API?
I've used it many times and it does get applied.
On Tue, Nov 1, 2011 at 10:38 PM, Mark Kerzner <[EMAIL PROTECTED]> wrote: > Hi, Hadoop experts, > > I've written my custom GroupComparator, and I want to tell Hadoop about it. > > Now, there is a call > > job.setGroupingComparatorClass(), > > but I only find it in mapreduce package of version 0.21. In prior versions, > I see a similar call > > conf.setOutputValueGroupingComparator(GroupComparator.class); > > but it does not cause my GroupComparator to be being used. > > So my question is, should I change the code to use the mapreduce package > (not a problem, since Cloudera has it backported to the current > distribution), or is there a different, simpler way? > > Thank you. Sincerely, > Mark >
-- Harsh J
-
Re: setGroupingComparatorClass
Mark Kerzner 2011-11-01, 18:43
Here is my GroupComparator. With it, I want to use just the part of my composite key, in order to say that all the keys that match in that part should go to the same reducer and be presented to the reducer with their values. So
public class GroupComparator extends WritableComparator {
public GroupComparator() { super(KeyTuple.class, true); }
@Override public int compare(WritableComparable K1, WritableComparable K2) { KeyTuple t1 = (KeyTuple) K1; KeyTuple t2 = (KeyTuple) K2; return t1.getSku().compareTo(t2.getSku()); } }
Then in the reducer I would expect many values, for all keys that I declared equal in my GroupComparator.
public void reduce(KeyTuple key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { System.out.println("Reducer key=" + key); while (values.hasNext()) { Text value = values.next(); System.out.println("Reducer value = " + value); } }
Instead, I still get individual full keys with one value, and the debugger does not step into my GroupComparator.
Thanks a bunch!
Mark
On Tue, Nov 1, 2011 at 1:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hey Mark, > > What problem do you see when you use > JobConf#setOutputValueGroupingComparator(…) when writing jobs with the > stable API? > > I've used it many times and it does get applied. > > On Tue, Nov 1, 2011 at 10:38 PM, Mark Kerzner <[EMAIL PROTECTED]> > wrote: > > Hi, Hadoop experts, > > > > I've written my custom GroupComparator, and I want to tell Hadoop about > it. > > > > Now, there is a call > > > > job.setGroupingComparatorClass(), > > > > but I only find it in mapreduce package of version 0.21. In prior > versions, > > I see a similar call > > > > conf.setOutputValueGroupingComparator(GroupComparator.class); > > > > but it does not cause my GroupComparator to be being used. > > > > So my question is, should I change the code to use the mapreduce package > > (not a problem, since Cloudera has it backported to the current > > distribution), or is there a different, simpler way? > > > > Thank you. Sincerely, > > Mark > > > > > > -- > Harsh J >
|
|