Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: HBase - Secondary Index


Copy link to this message
-
RE: HBase - Secondary Index
Anoop Sam John 2012-12-18, 09:35
Hi Mike
>My question is that since you don't have any formal SQL syntax, how are you doing this all server side?
I think the question is to Anil.. In his case he is not doing the index data scan at the server side. He scan the index table data back to client and from client doing gets to get the main table data.  Correct Anil?
Just making  it clear... :)

-Anoop-
________________________________________
From: Michel Segel [[EMAIL PROTECTED]]
Sent: Tuesday, December 18, 2012 2:32 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: HBase - Secondary Index

Just a couple of questions...

First, since you don't have any natural secondary indices, you can create one from a couple of choices. Keeping it simple, you choose an inverted table as your index.

In doing so, you have one column containing all of the row ids for a given value.
This means that it is a simple get().

My question is that since you don't have any formal SQL syntax, how are you doing this all server side?
Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Anoop,
>
> Please find my reply inline.
>
> Thanks,
> Anil Gupta
>
> On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
>
>> Hi Anil
>>                During the scan, there is no need to fetch any index data
>> to client side. So there is no need to create any scanner on the index
>> table at the client side. This happens at the server side.
>
>
>>
>> For the Scan on the main table with condition on timestamp and customer
>> id, a scanner to be created with Filters. Yes like normal when there is no
>> secondary index. So this scan from the client will go through all the
>> regions in the main table.
>
>
> Anil: Do you mean that if the table is spread across 50 region servers in
> 60 node cluster then we need to send a scan request to all the 50 RS.
> Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> solution. Your solution looked cleaner than this since you exactly knew
> which Node you need to go to for querying while using secondary index due
> to co-location(due to static begin part for secondary table rowkey) of
> region of primary table and secondary index table. My problem is little
> more complicated due to the constraints that: I cannot have a "static begin
> part" in the rowkey of my secondary table.
>
> When it scans one particular region say (x,y] on the main table, using the
>> CP we can get the index table region object corresponding to this main
>> table region from the RS.  There is no issue in creating the static part of
>> the rowkey. You know 'x' is the region start key. Then at the server side
>> will create a scanner on the index region directly and here we can specify
>> the startkey. 'x' + <timestamp value> + <customer id>..  Using the results
>> from the index scan we will make reseek on the main region to the exact
>> rows where the data what we are interested in is available. So there wont
>> be a full region data scan happening.
>
>> When in the cases where only timestamp is there but no customer id, it
>> will be simple again. Create a scanner on the main table with only one
>> filter. At the CP side the scanner on the index region will get created
>> with startkey as 'x' + <timestamp value>..    When you create the scan
>> object and set startRow on that it need not be the full rowkey. It can be
>> part of the rowkey also. Yes like prefix.
>>
>> Hope u got it now :)
> Anil: I hope now we are on same page. Thanks a lot for your valuable time
> to discuss this stuff.
>
>>
>> -Anoop-
>> ________________________________________
>> From: anil gupta [[EMAIL PROTECTED]]
>> Sent: Friday, December 14, 2012 11:31 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: HBase - Secondary Index
>>
>> On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi Anil,
>>>
>>>> 1. In your presentation you mentioned that region of Primary Table and