Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: How to efficiently join HBase tables?


+
Florin P 2011-06-16, 12:44
+
Buttler, David 2011-06-17, 00:02
+
Eran Kutner 2011-05-31, 12:06
+
Ferdy Galema 2011-05-31, 12:31
+
Eran Kutner 2011-05-31, 12:43
+
Michael Segel 2011-05-31, 14:20
+
Doug Meil 2011-05-31, 14:22
+
Michael Segel 2011-05-31, 14:56
+
Doug Meil 2011-05-31, 15:42
+
Eran Kutner 2011-05-31, 18:42
+
Michael Segel 2011-05-31, 20:09
+
Michael Segel 2011-05-31, 18:56
+
Ted Dunning 2011-05-31, 19:02
+
Eran Kutner 2011-05-31, 19:19
+
Ted Dunning 2011-05-31, 20:10
+
Patrick Angeles 2011-05-31, 20:41
+
Jason Rutherglen 2011-06-01, 00:18
+
Bill Graham 2011-06-01, 00:35
+
Jason Rutherglen 2011-06-01, 00:41
+
Eran Kutner 2011-06-01, 10:50
+
Lars George 2011-06-01, 13:54
+
Jason Rutherglen 2011-06-01, 14:47
+
Michael Segel 2011-06-02, 21:05
+
Eran Kutner 2011-06-03, 07:23
+
Buttler, David 2011-06-06, 20:30
+
Doug Meil 2011-06-06, 21:19
+
Michael Segel 2011-06-07, 02:08
+
Doug Meil 2011-06-08, 13:01
+
Eran Kutner 2011-06-08, 18:47
+
Buttler, David 2011-06-08, 20:45
+
Dave Latham 2011-06-08, 21:35
+
Buttler, David 2011-06-08, 23:02
+
Eran Kutner 2011-06-09, 09:35
+
Michel Segel 2011-06-09, 12:09
+
Michel Segel 2011-06-08, 14:14
+
Doug Meil 2011-06-09, 02:56
Copy link to this message
-
Re: How to efficiently join HBase tables?
Doug,
I think I should clarify something...

Yes I am the only one who is saying get() won't work.
The question was asked on how to do an efficient join where there were no specific parameters like joining on key values. It wasn't until yesterday that Eran gave an example of the specific problem...

So you have to make some assumptions that you are attempting to solve this problem in the general case. That is you cant assume that you are joining on the row keys. Since we are talking about a big data problem you can assume that your data sets are going to be huge.
So you have to consider how many rows you can store in memory and that you may not have enough memory.

Because most who are moving to a NoSQL database only have been exposed to relational models, they will approach HBase from a relational schema design. So solving the question of how to join two tables efficiently in general terms has a lot of value to the community as a whole.

David is right in that I'm looking back and taking the approach of how joining two data sets.
It's not rocket science and this problem has been solved under different paradigms.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 8, 2011, at 9:56 PM, Doug Meil <[EMAIL PROTECTED]> wrote:

> Hi there-
>
> Summary comment:
>
> 1)  Preference
>
> Several people in this thread have suggested approaches (map-side memory join, multi-get, temp files), all of which have merit and have advantages in certain situations.  Kudos to the dist-list for chiming in.  The "right" approach depends on the specific problem you are trying to solve, and that's preference.
>
> 2)   Possibility
>
> Mike, you're the only one in this thread arguing that one of the approaches isn't possible.  And you seem to be arguing that it's never possible to look up a record in HBase with a Get, but others don't seem to have this problem.
>
> I wish everybody success with their HBase join research, but I'm checking out of this thread (ka-ching).
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Michel Segel
> Sent: Wednesday, June 08, 2011 10:14 AM
> To: [EMAIL PROTECTED]
> Subject: Re: How to efficiently join HBase tables?
>
> Unless I am mistaken... get() requires a row key, right?
> And you can join tables on column data which isn't in the row key, right?
>
> So how do you do a get()? :-)
>
> Sure there is more than one way to skin a cat. But if you want to be efficient... You will create a set of unique keys based on the columns that you want to join. Note that if you are going to use a temp table in hbase, you will want to store the unique key value A|B and when you write the row to the temp table, you will append an unique identifier like a uuid so that you don't lose the row.
>
> Here your input list to the actual join is going to be the list of unique keys and then you do a scan to get the rows.
>
> Again, I could be wrong but how can you perform a get() when you only know a portion of the row key?
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 8, 2011, at 8:01 AM, Doug Meil <[EMAIL PROTECTED]> wrote:
>
>>
>> Re: " With respect to Doug's posts, you can't do a multi-get off the bat"
>>
>> That's an assumption, but you're entitled to your opinion.
>>
>> -----Original Message-----
>> From: Michael Segel [mailto:[EMAIL PROTECTED]]
>> Sent: Monday, June 06, 2011 10:08 PM
>> To: [EMAIL PROTECTED]
>> Subject: RE: How to efficiently join HBase tables?
>>
>>
>> Well....
>>
>> David, is correct.
>>
>> Eran wanted to do a join which is a relational concept that isn't natively supported by a NoSQL database. A better model would be a hierarchical model like Dick Pick's Revelation. (Univers aka U2 from Ardent/Informix/IBM/now JRockit?).
>> And yes, we're looking back 40 some odd years in to either a
>> merge/sort solution or how databases do a relational join. :-)
+
Doug Meil 2011-05-31, 19:39
+
Michael Segel 2011-05-31, 20:18
+
Jason Rutherglen 2011-05-31, 18:48