You don’t need to do this.

Its already done for you by the existing APIs.

A scan will allow you to do either a full table scan (no range limits provided) or a range scan where you provide the boundaries.

So if you’re using a client connection to HBase, its done for you.

If you’re writing a M/R job, you are already getting one mapper task assigned per region.  So your parallelism is already done for you.

Its possible that the Input Format is smart enough to pre-check the regions to see if they are within the boundaries or not and if not, no mapper task is generated.


The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB