Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: HBase - Secondary Index


Copy link to this message
-
Re: HBase - Secondary Index
On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Anil,
>
> >1. In your presentation you mentioned that region of Primary Table and
> Region of Secondary Table are always located on the same region server. How
> do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
> of Secondary Table? Will your implementation work if the rowkey of primary
> table cannot be used as prefix in rowkey of Secondary table( i have this
> limitation in my use case)?
> First all there will be same number of regions in both primary and index
> tables. All the start/stop keys of the regions also will be same.
> Suppose there are 2 regions on main table say for keys 0-10 and 10-20.
>  Then we will create 2 regions in index table also with same key ranges.
> At the master balancing level it is easy to collocate these regions seeing
> the start and end keys.
> When the selection of the rowkey that will be used in the index table is
> the key here.
> What we will do is all the rowkeys in the index table will be prefixed
> with the start key of the region/
> When an entry is added to the main table with rowkey as 5 it will go to
> the 1st region (0-10)
> Now there will be index region with range as 0-10.  We will select this
> region to store this index data.
> The row getting added into the index region for this entry will have a
> rowkey 0_x_5
> I am just using '_' as a seperator here just to show this. Actually we
> wont be having any seperator.
> So the rowkeys (in index region) will have a static begin part always.
>  Will scan time also we know this part and so the startrow and endrow
> creation for the scan will be possible.. Note that we will store the actual
> table row key as the last part of the index rowkey itself not as a value.
> This is better option in our case of handling the scan index usage also at
> sever side.  There is no index data fetch to client side..
>

Anil: My primary table rowkey is customerId+event_id, and my secondary
table rowkey is timestamp+ customerid. In your implementation it seems like
for using secondary index the application needs to know about the
"start_key" of the region(static begin part) it wants to query. Right? Do
you separately manage the logic of determining the region
"start_key"(static begin part) for a scan?
Also, Its possible that while using secondary index the customerId is not
provided. So, i wont be having customer id for all the queries. Hence i
cannot use customer_id as a prefix in rowkey of my Secondary Table.

>
> I feel your use case perfectly fit with our model
>
Anil: Somehow i am unable to fit your implementation into my use case due
to the constraint of static begin part of rowkey in Secondary table. There
seems to be a disconnect. Can you tell me how does my use case fits into
your implementation?

>
> >2. Are you using an Endpoint or Observer for building the secondary index
> table?
> Observer
>
> >3. "Custom balancer do collocation". Is it a custom load balancer of HBase
> Master or something else?
> It is a balancer implementation which will be plugged into Master
>
> >4. Your region split looks interesting. I dont have much info about it.
> Can
> you point to some docs on IndexHalfStoreFileReader?
> Sorry I am not able to publish any design doc or code as the company has
> not decided to open src the solution yet.
> Any particular query you come acorss pls feel free to aske me :)
> You can see the HalfStoreFileReader class 1st..
>
> -Anoop-
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Friday, December 14, 2012 2:11 PM
> To: [EMAIL PROTECTED]
> Subject: Re: HBase - Secondary Index
>
> Hi Anoop,
>
> Nice presentation and seems like a smart implementation. Since the
> presentation only covered bullet points so i have couple of questions on
> your implementation. :)
>
> Here is a recap to my implementation and our previous discussion on
> Secondary index:
>
> Here is the link to previous email thread:

Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB