Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Custom HBase table split that sents collocated rows to the same region


Copy link to this message
-
Re: Custom HBase table split that sents collocated rows to the same region
You did not mention what version of HBase you are on.

In 0.94/trunk, there is a RegionSplitPolicy feature that may work in
your case ...
https://issues.apache.org/jira/browse/HBASE-5304
http://search-hadoop.com/jd/hbase/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html

I came across this implementation which may be what you want
http://search-hadoop.com/jd/hbase/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html
--Suraj

On Wed, Apr 4, 2012 at 7:52 AM, a <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Suppose that I have "tall-narrow" HBase table with composite key e.g.
> {class_id}#{student_id}.
>
> The exemplary data will look like as follow:
>
> ROW_KEY  |   ONE COLLUMN FAMILY
> ----------------------------------------------------------------
> 1        |   name = "Object Oriented Programming"
>         |   location = "Building A"
>         |   semester = "Winter"
>         |   // many other information about class
> ----------------------------------------------------------------
> 1_1      |   name = "Alice White"
> 1_2      |   name = "Betty Lipcon"
> // many other records related to class with ID = 1
> ----------------------------------------------------------------
> // many other records related to class with ID = 2, 3, 4, .. N
>
>
> I would like to use this HBase table as input source for my MapReduce job, where
> the mapper will emit <key, value> pairs where:
> key = ${class_id}#${student_id},
> value = some information about corresponding class.
>
> Thanks to lexicographically sorting of row keys, it would be easily to implement
> if I could split HBase table into regions where all colocated rows (with the
> same row prefix i.e. {class_id}) will reside in the same region. Then for each
> group of such collocated records, I could use its first row to get information
> about class and emit this information with rowkey from each remaining row.
>
> So I would like to ask, if such a custom split is easy to implement?
>
> I know that:
> 1) I could model it with "flat-wide" table and I will have everything what I
> need in separate rows,
> 2) use two MR jobs for that.
>
> but I am interested in best solution for "tall-narrow" table with one MR job.
>
> Many thanks in advance for any hints!
>
>
>
>
>
>
>
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB