Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Can I make use of TableSplit across Regions to make my MR job faster?


Copy link to this message
-
Re: Can I make use of TableSplit across Regions to make my MR job faster?
Michael Segel 2013-08-26, 10:48
A 'table split' is a region split and as you split regions, balance the regions, you should see some parallelism in your M/R jobs.

Of course depending on your choice of row keys... YMMV.

HTH

-Mike

On Aug 26, 2013, at 2:16 AM, Pavan Sudheendra <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> How to make use of a TableSplit or a Region Split? How is it used in
> TableInputFormatBase#
> getSplits() ?
>
>
> I have 6 Region Servers across the cluster for the map-reduce task which i
> am using, How to leverage this so that the table is split across the
> clusters and the map-reduce application finishes fast.. Right now, it is
> very slow.. For aggregating 3 table values, 1 with 100,000 rows and other
> two tables i'm only using get operating to get the value by passing the
> key.. For this setup, it takes 40-50 mins.. Which is worse.. The first
> table would eventually be around 20-25m rows.. Please lead me in the right
> way.. I will paste the code if anybody is interested.
>
>
> --
> Regards-
> Pavan

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com