Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> setGroupingComparatorClass


Copy link to this message
-
Re: setGroupingComparatorClass
Here is my GroupComparator. With it, I want to use just the part of my
composite key, in order to say that all the keys that match in that part
should go to the same reducer and be presented to the reducer with their
values. So

public class GroupComparator extends WritableComparator {

    public GroupComparator() {
        super(KeyTuple.class, true);
    }

    @Override
    public int compare(WritableComparable K1,
            WritableComparable K2) {
        KeyTuple t1 = (KeyTuple) K1;
        KeyTuple t2 = (KeyTuple) K2;
        return t1.getSku().compareTo(t2.getSku());
    }
}

Then in the reducer I would expect many values, for all keys that I
declared equal in my GroupComparator.

    public void reduce(KeyTuple key, Iterator<Text> values,
            OutputCollector<Text, Text> output, Reporter reporter)
            throws IOException {
        System.out.println("Reducer key=" + key);
        while (values.hasNext()) {
            Text value = values.next();
            System.out.println("Reducer value = " + value);
        }
    }

Instead, I still get individual full keys with one value, and the debugger
does not step into my GroupComparator.

Thanks a bunch!

Mark

On Tue, Nov 1, 2011 at 1:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hey Mark,
>
> What problem do you see when you use
> JobConf#setOutputValueGroupingComparator(…) when writing jobs with the
> stable API?
>
> I've used it many times and it does get applied.
>
> On Tue, Nov 1, 2011 at 10:38 PM, Mark Kerzner <[EMAIL PROTECTED]>
> wrote:
> > Hi, Hadoop experts,
> >
> > I've written my custom GroupComparator, and I want to tell Hadoop about
> it.
> >
> > Now, there is a call
> >
> > job.setGroupingComparatorClass(),
> >
> > but I only find it in mapreduce package of version 0.21. In prior
> versions,
> > I see a similar call
> >
> > conf.setOutputValueGroupingComparator(GroupComparator.class);
> >
> > but it does not cause my GroupComparator to be being used.
> >
> > So my question is, should I change the code to use the mapreduce package
> > (not a problem, since Cloudera has it backported to the current
> > distribution), or is there a different, simpler way?
> >
> > Thank you. Sincerely,
> > Mark
> >
>
>
>
> --
> Harsh J
>