Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
Copy link to this message
-
Re: HBase - Secondary Index
Michel Segel 2012-12-18, 09:02
Just a couple of questions...

First, since you don't have any natural secondary indices, you can create one from a couple of choices. Keeping it simple, you choose an inverted table as your index.

In doing so, you have one column containing all of the row ids for a given value.
This means that it is a simple get().

My question is that since you don't have any formal SQL syntax, how are you doing this all server side?
Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Anoop,
>
> Please find my reply inline.
>
> Thanks,
> Anil Gupta
>
> On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
>
>> Hi Anil
>>                During the scan, there is no need to fetch any index data
>> to client side. So there is no need to create any scanner on the index
>> table at the client side. This happens at the server side.
>
>
>>
>> For the Scan on the main table with condition on timestamp and customer
>> id, a scanner to be created with Filters. Yes like normal when there is no
>> secondary index. So this scan from the client will go through all the
>> regions in the main table.
>
>
> Anil: Do you mean that if the table is spread across 50 region servers in
> 60 node cluster then we need to send a scan request to all the 50 RS.
> Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> solution. Your solution looked cleaner than this since you exactly knew
> which Node you need to go to for querying while using secondary index due
> to co-location(due to static begin part for secondary table rowkey) of
> region of primary table and secondary index table. My problem is little
> more complicated due to the constraints that: I cannot have a "static begin
> part" in the rowkey of my secondary table.
>
> When it scans one particular region say (x,y] on the main table, using the
>> CP we can get the index table region object corresponding to this main
>> table region from the RS.  There is no issue in creating the static part of
>> the rowkey. You know 'x' is the region start key. Then at the server side
>> will create a scanner on the index region directly and here we can specify
>> the startkey. 'x' + <timestamp value> + <customer id>..  Using the results
>> from the index scan we will make reseek on the main region to the exact
>> rows where the data what we are interested in is available. So there wont
>> be a full region data scan happening.
>
>> When in the cases where only timestamp is there but no customer id, it
>> will be simple again. Create a scanner on the main table with only one
>> filter. At the CP side the scanner on the index region will get created
>> with startkey as 'x' + <timestamp value>..    When you create the scan
>> object and set startRow on that it need not be the full rowkey. It can be
>> part of the rowkey also. Yes like prefix.
>>
>> Hope u got it now :)
> Anil: I hope now we are on same page. Thanks a lot for your valuable time
> to discuss this stuff.
>
>>
>> -Anoop-
>> ________________________________________
>> From: anil gupta [[EMAIL PROTECTED]]
>> Sent: Friday, December 14, 2012 11:31 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: HBase - Secondary Index
>>
>> On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi Anil,
>>>
>>>> 1. In your presentation you mentioned that region of Primary Table and
>>> Region of Secondary Table are always located on the same region server.
>> How
>>> do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
>>> of Secondary Table? Will your implementation work if the rowkey of
>> primary
>>> table cannot be used as prefix in rowkey of Secondary table( i have this
>>> limitation in my use case)?
>>> First all there will be same number of regions in both primary and index
>>> tables. All the start/stop keys of the regions also will be same.
>>> Suppose there are 2 regions on main table say for keys 0-10 and 10-20.
+
Anoop Sam John 2012-12-18, 09:35
+
anil gupta 2012-12-19, 08:39
+
Shengjie Min 2012-12-27, 11:23
+
Anoop Sam John 2012-12-27, 11:30
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
+
Anoop Sam John 2012-12-28, 04:14
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
+
ramkrishna vasudevan 2013-01-09, 04:11
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44