Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: HBase - Secondary Index


Copy link to this message
-
Re: HBase - Secondary Index
anil gupta 2012-12-14, 08:41
Hi Anoop,

Nice presentation and seems like a smart implementation. Since the
presentation only covered bullet points so i have couple of questions on
your implementation. :)

Here is a recap to my implementation and our previous discussion on
Secondary index:

Here is the link to previous email thread:
http://search-hadoop.com/m/1zWPMaaRtr .

The secondary index is stored in table "B" as rowkey B --> family:<rowkey
A>  . "<rowkey A>" is the column qualifier. Every row in B will only on
have one column "k" and the value of that column is the rowkey of A.

Suppose i am storing customer events in table A. I have two requirement for
data query:
1. Query customer events on basis of customer_Id and event_ID.
2. Query customer events on basis of event_timestamp and customer_ID.

70% of querying is done by query#1, so i will create
<customer_Id><event_ID> as row key of Table A.
Now, in order to support fast results for query#2, i need to create a
secondary index on A. I store that secondary index in B, rowkey of B is
<event_timestamp><customer_ID>.Every row stores the corresponding rowkey of
A.

HBase Querying approach:
1. Scan the secondary table by using prefix filter and startRow to get the
list of Rowkeys of Primary table.
2. Do a batch get on primary table by using HTable.get(List<Get>) method
using the list of Rowkeys obtained in step1.

The only issue is that in my solution i have at least two RPC calls. Once
each in step1 and step2 above. I want to reduce the number of RPC to 1 if
possible.
******Questions on your implementation:*********

1. In your presentation you mentioned that region of Primary Table and
Region of Secondary Table are always located on the same region server. How
do you achieve it? By using the Primary table rowkey as prefix of  Rowkey
of Secondary Table? Will your implementation work if the rowkey of primary
table cannot be used as prefix in rowkey of Secondary table( i have this
limitation in my use case)?
2. Are you using an Endpoint or Observer for building the secondary index
table?
3. "Custom balancer do collocation". Is it a custom load balancer of HBase
Master or something else?
4. Your region split looks interesting. I dont have much info about it. Can
you point to some docs on IndexHalfStoreFileReader?

Thanks,
Anil Gupta

On Tue, Dec 4, 2012 at 12:10 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi All
>
>             Last week I got a chance to present the secondary indexing
> solution what we have done in Huawei at the China Hadoop Conference.  You
> can see the presentation from
> http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf
>
>
>
> I would like to hear what others think on this. :)
>
>
>
> -Anoop-
>

--
Thanks & Regards,
Anil Gupta