Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase - Secondary Index

Copy link to this message
Re: HBase - Secondary Index
Just a couple of questions...

First, since you don't have any natural secondary indices, you can create one from a couple of choices. Keeping it simple, you choose an inverted table as your index.

In doing so, you have one column containing all of the row ids for a given value.
This means that it is a simple get().

My question is that since you don't have any formal SQL syntax, how are you doing this all server side?
Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 18, 2012, at 2:28 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Anoop,
> Please find my reply inline.
> Thanks,
> Anil Gupta
> On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
>> Hi Anil
>>                During the scan, there is no need to fetch any index data
>> to client side. So there is no need to create any scanner on the index
>> table at the client side. This happens at the server side.
>> For the Scan on the main table with condition on timestamp and customer
>> id, a scanner to be created with Filters. Yes like normal when there is no
>> secondary index. So this scan from the client will go through all the
>> regions in the main table.
> Anil: Do you mean that if the table is spread across 50 region servers in
> 60 node cluster then we need to send a scan request to all the 50 RS.
> Right? Doesn't it sounds expensive? IMHO you were not doing this in your
> solution. Your solution looked cleaner than this since you exactly knew
> which Node you need to go to for querying while using secondary index due
> to co-location(due to static begin part for secondary table rowkey) of
> region of primary table and secondary index table. My problem is little
> more complicated due to the constraints that: I cannot have a "static begin
> part" in the rowkey of my secondary table.
> When it scans one particular region say (x,y] on the main table, using the
>> CP we can get the index table region object corresponding to this main
>> table region from the RS.  There is no issue in creating the static part of
>> the rowkey. You know 'x' is the region start key. Then at the server side
>> will create a scanner on the index region directly and here we can specify
>> the startkey. 'x' + <timestamp value> + <customer id>..  Using the results
>> from the index scan we will make reseek on the main region to the exact
>> rows where the data what we are interested in is available. So there wont
>> be a full region data scan happening.
>> When in the cases where only timestamp is there but no customer id, it
>> will be simple again. Create a scanner on the main table with only one
>> filter. At the CP side the scanner on the index region will get created
>> with startkey as 'x' + <timestamp value>..    When you create the scan
>> object and set startRow on that it need not be the full rowkey. It can be
>> part of the rowkey also. Yes like prefix.
>> Hope u got it now :)
> Anil: I hope now we are on same page. Thanks a lot for your valuable time
> to discuss this stuff.
>> -Anoop-
>> ________________________________________
>> From: anil gupta [[EMAIL PROTECTED]]
>> Sent: Friday, December 14, 2012 11:31 PM
>> Subject: Re: HBase - Secondary Index
>> On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John <[EMAIL PROTECTED]>
>> wrote:
>>> Hi Anil,
>>>> 1. In your presentation you mentioned that region of Primary Table and
>>> Region of Secondary Table are always located on the same region server.
>> How
>>> do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
>>> of Secondary Table? Will your implementation work if the rowkey of
>> primary
>>> table cannot be used as prefix in rowkey of Secondary table( i have this
>>> limitation in my use case)?
>>> First all there will be same number of regions in both primary and index
>>> tables. All the start/stop keys of the regions also will be same.
>>> Suppose there are 2 regions on main table say for keys 0-10 and 10-20.