Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to efficiently join HBase tables?


Copy link to this message
-
How to efficiently join HBase tables?
Hi,
I need to join two HBase tables. The obvious way is to use a M/R job for
that. The problem is that the few references to that question I found
recommend pulling one table to the mapper and then do a lookup for the
referred row in the second table.
This sounds like a very inefficient way to do  join with map reduce. I
believe it would be much better to feed the rows of both tables to the
mapper and let it emit a key based on the join fields. Since all the rows
with the same join fields values will have the same key the reducer will be
able to easily generate the result of the join.
The problem with this is that I couldn't find a way to feed two tables to a
single map reduce job. I could probably dump the tables to files in a single
directory and then run the join on the files but that really makes no sense.

Am I missing something? Any other ideas?

-eran