Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase joins using MultiTableInputCollection [HBASE-3996]


Copy link to this message
-
Hbase joins using MultiTableInputCollection [HBASE-3996]
Hello,

I came across this ticket for multiple table scans via and their use in
Map/Reduce jobs:

https://issues.apache.org/jira/browse/HBASE-3996
https://reviews.apache.org/r/4411/diff/7/

There is a patch for this now and it is mentioned in the comments that the
functionality could be useful for doing joins as part of a map reduce.
Could someone briefly explain how this works? I am interested in doing
joins between 2 tables on rowkeys.

If I append both tables to the newly added MultiTableInputCollection
instance and use that in a Map/Reduce - would map(<rowkey>, <value>) only
be called once per unique <rowkey> with <value> containing 2 value sets if
the key was found in both tables?

If there exist any practical examples for doing joins on HBase tables I'd
appreciate a link. Also, I am using Hbase client 0.90.6-cdh3u4, is the
patch applicable to this version of HBase at all?

Thank you,

/David
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB