|
|
-
Hbase joins using MultiTableInputCollection [HBASE-3996]David Koch 2012-07-17, 20:39
Hello,
I came across this ticket for multiple table scans via and their use in Map/Reduce jobs: https://issues.apache.org/jira/browse/HBASE-3996 https://reviews.apache.org/r/4411/diff/7/ There is a patch for this now and it is mentioned in the comments that the functionality could be useful for doing joins as part of a map reduce. Could someone briefly explain how this works? I am interested in doing joins between 2 tables on rowkeys. If I append both tables to the newly added MultiTableInputCollection instance and use that in a Map/Reduce - would map(<rowkey>, <value>) only be called once per unique <rowkey> with <value> containing 2 value sets if the key was found in both tables? If there exist any practical examples for doing joins on HBase tables I'd appreciate a link. Also, I am using Hbase client 0.90.6-cdh3u4, is the patch applicable to this version of HBase at all? Thank you, /David +
Ted Yu 2012-07-17, 21:07
+
David Koch 2012-07-18, 10:11
|