Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Can I make use of TableSplit across Regions to make my MR job faster?


+
Pavan Sudheendra 2013-08-26, 07:16
Copy link to this message
-
Re: Can I make use of TableSplit across Regions to make my MR job faster?
A 'table split' is a region split and as you split regions, balance the regions, you should see some parallelism in your M/R jobs.

Of course depending on your choice of row keys... YMMV.

HTH

-Mike

On Aug 26, 2013, at 2:16 AM, Pavan Sudheendra <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> How to make use of a TableSplit or a Region Split? How is it used in
> TableInputFormatBase#
> getSplits() ?
>
>
> I have 6 Region Servers across the cluster for the map-reduce task which i
> am using, How to leverage this so that the table is split across the
> clusters and the map-reduce application finishes fast.. Right now, it is
> very slow.. For aggregating 3 table values, 1 with 100,000 rows and other
> two tables i'm only using get operating to get the value by passing the
> key.. For this setup, it takes 40-50 mins.. Which is worse.. The first
> table would eventually be around 20-25m rows.. Please lead me in the right
> way.. I will paste the code if anybody is interested.
>
>
> --
> Regards-
> Pavan

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB