Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: HBase - Secondary Index


+
anil gupta 2012-12-14, 08:41
+
Anoop Sam John 2012-12-14, 08:54
+
ramkrishna vasudevan 2012-12-14, 11:34
+
anil gupta 2012-12-14, 18:01
+
Anoop Sam John 2012-12-17, 04:02
+
anil gupta 2012-12-18, 08:28
+
Anoop Sam John 2012-12-18, 09:27
+
anil gupta 2012-12-19, 08:24
+
Michel Segel 2012-12-18, 09:02
+
Anoop Sam John 2012-12-18, 09:35
+
anil gupta 2012-12-19, 08:39
+
Shengjie Min 2012-12-27, 11:23
Copy link to this message
-
RE: HBase - Secondary Index
Anoop Sam John 2012-12-27, 11:30

>What happens when regions get splitted ? do you update the startkey on the
index table?

We have a custom HalfStoreFileReader to read the split index region data. This reader will change the rowkey it returns with replacing the startkey part.
After a split immediately HBase will initiate a compaction and the compation uses this new reader. So the rowkey coming out will be a changed one and thus the newly written HFiles will have the changed rowkey.  Also a normal read (as part of scan) during this time uses this new reader and so we will always get the rowkey in the expected format..  :)   Hope I make it clear for you.

-Anoop-
________________________________________
From: Shengjie Min [[EMAIL PROTECTED]]
Sent: Thursday, December 27, 2012 4:53 PM
To: [EMAIL PROTECTED]
Subject: Re: HBase - Secondary Index

Hi Anoop,

>First all there will be same number of regions in both primary and index
tables. All the start/stop keys of the regions also will be same.
>Suppose there are 2 regions on main table say for keys 0-10 and 10-20.
 Then we will create 2 regions in index table also with same key ranges.
>At the master balancing level it is easy to collocate these regions seeing
the start and end keys.
>When the selection of the rowkey that will be used in the index table is
the key here.
>What we will do is all the rowkeys in the index table will be prefixed
with the start key of the region/
>When an entry is added to the main table with rowkey as 5 it will go to
the 1st region (0-10)
>Now there will be index region with range as 0-10.  We will select this
region to store this index data.
>The row getting added into the index region for this entry will have a
rowkey 0_x_5
>I am just using '_' as a seperator here just to show this. Actually we
wont be having any seperator.
>So the rowkeys (in index region) will have a static begin part always.
 Will scan time also we know this part and so the startrow and endrow
creation for the scan will be possible.. Note that we will store the actual
table row >key as the last part of the index rowkey itself not as a value.
>This is better option in our case of handling the scan index usage also at
sever side.  There is no index data fetch to client side..

What happens when regions get splitted ? do you update the startkey on the
index table?

-Shengjie
On 14 December 2012 08:54, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Anil,
>
> >1. In your presentation you mentioned that region of Primary Table and
> Region of Secondary Table are always located on the same region server. How
> do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
> of Secondary Table? Will your implementation work if the rowkey of primary
> table cannot be used as prefix in rowkey of Secondary table( i have this
> limitation in my use case)?
> First all there will be same number of regions in both primary and index
> tables. All the start/stop keys of the regions also will be same.
> Suppose there are 2 regions on main table say for keys 0-10 and 10-20.
>  Then we will create 2 regions in index table also with same key ranges.
> At the master balancing level it is easy to collocate these regions seeing
> the start and end keys.
> When the selection of the rowkey that will be used in the index table is
> the key here.
> What we will do is all the rowkeys in the index table will be prefixed
> with the start key of the region/
> When an entry is added to the main table with rowkey as 5 it will go to
> the 1st region (0-10)
> Now there will be index region with range as 0-10.  We will select this
> region to store this index data.
> The row getting added into the index region for this entry will have a
> rowkey 0_x_5
> I am just using '_' as a seperator here just to show this. Actually we
> wont be having any seperator.
> So the rowkeys (in index region) will have a static begin part always.
>  Will scan time also we know this part and so the startrow and endrow
> creation for the scan will be possible.. Note that we will store the actual

All the best,
Shengjie Min
+
Shengjie Min 2012-12-27, 13:07
+
Anoop John 2012-12-27, 15:54
+
ramkrishna vasudevan 2012-12-27, 16:11
+
Shengjie Min 2012-12-27, 16:29
+
Anoop Sam John 2012-12-28, 03:33
+
Mohit Anchlia 2012-12-28, 03:42
+
Anoop Sam John 2012-12-28, 04:14
+
Shengjie Min 2012-12-28, 10:55
+
Adrien Mogenet 2013-01-06, 20:30
+
Anoop Sam John 2013-01-07, 03:48
+
Mohit Anchlia 2013-01-07, 04:17
+
Anoop Sam John 2013-01-07, 13:49
+
Michael Segel 2013-01-08, 14:33
+
lars hofhansl 2013-01-09, 00:30
+
Michel Segel 2013-01-09, 01:30
+
anil gupta 2013-01-09, 01:28
+
Anoop Sam John 2013-01-09, 03:22
+
ramkrishna vasudevan 2013-01-09, 04:11
+
Mohit Anchlia 2013-01-09, 01:50
+
Asaf Mesika 2013-01-08, 23:00
+
Mohit Anchlia 2013-01-06, 20:36
+
Adrien Mogenet 2013-01-06, 20:40
+
anil gupta 2013-01-06, 22:12
+
Anoop Sam John 2012-12-20, 03:33
+
Farah Karim 2012-12-25, 10:14
+
David Arthur 2012-12-20, 02:47
+
Anoop Sam John 2012-12-20, 03:44