Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - MapReduce: Reducers partitions.


Copy link to this message
-
Re: MapReduce: Reducers partitions.
Jean-Marc Spaggiari 2013-04-11, 11:52
Thanks all for your comments.

I looked for partitioners into HBase scope only, that's why I also thought
we where using HTablePartitioner. But looking at the default one used I
found org.apache.hadoop.mapreduce.lib.partition.HashPartitioner like St.Ack
confirmed. And it's doing exactly what I was talking about for the keyhash
(and not keycrc).

Changing HRegionPartitioner behaviour also will be useless because
TableMapReduceUtil will overwrite the number of reducers if we have set
more than the number of regions.

      if (job.getNumReduceTasks() > regions) {
        job.setNumReduceTasks(outputTable.getRegionsInfo().size());
      }

So I just need to stay with the default partioner then.

Thanks,

JM

2013/4/10 Stack <[EMAIL PROTECTED]>

> On Wed, Apr 10, 2013 at 12:01 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
> > Hi Greame,
> >
> > No. The reducer will simply write on the table the same way you are
> doing a
> > regular Put. If a split is required because of the size, then the region
> > will be split, but at the end, there will not necessary be any region
> > split.
> >
> > In the usecase described below, all the 600 lines will "simply" go into
> the
> > only region in the table and no split will occur.
> >
> > The goal is to partition the data for the reducer only. Not in the table.
> >
>
>
> Then just use the default partitioner?
>
> The suggestion that you use HTablePartitioner seems inappropriate to your
> task.  See the sink doc here:
>
> http://hadoop.apache.org/docs/r2.0.3-alpha/api/org/apache/hadoop/mapreduce/lib/partition/HashPartitioner.html
>
> St.Ack
>