Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase joins using MultiTableInputCollection [HBASE-3996]


Copy link to this message
-
Hbase joins using MultiTableInputCollection [HBASE-3996]
David Koch 2012-07-17, 20:39
Hello,

I came across this ticket for multiple table scans via and their use in
Map/Reduce jobs:

https://issues.apache.org/jira/browse/HBASE-3996
https://reviews.apache.org/r/4411/diff/7/

There is a patch for this now and it is mentioned in the comments that the
functionality could be useful for doing joins as part of a map reduce.
Could someone briefly explain how this works? I am interested in doing
joins between 2 tables on rowkeys.

If I append both tables to the newly added MultiTableInputCollection
instance and use that in a Map/Reduce - would map(<rowkey>, <value>) only
be called once per unique <rowkey> with <value> containing 2 value sets if
the key was found in both tables?

If there exist any practical examples for doing joins on HBase tables I'd
appreciate a link. Also, I am using Hbase client 0.90.6-cdh3u4, is the
patch applicable to this version of HBase at all?

Thank you,

/David