Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best technique for doing lookup with Secondary Index


Copy link to this message
-
Re: Best technique for doing lookup with Secondary Index
Hi Danis,

I downloaded the zip file and copied the source code to my HBase0.92.1
project. It compiled successfully. I am going through the source code right
now. Is it possible for you to provide a architecture diagram for you
implementation?comments in code? It will be easier for users to understand
you implementation quickly.

Thanks,
Anil Gupta
On Fri, Oct 26, 2012 at 8:14 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> @fding hbase: thanks for the link. I'll look into it.
>
> Interesting to know that within a region server we dont need a RPC call.
> If we can collocate two regions(or more) then that is the best solution. I
> am not sure how hard it'll be to write a custom load balancer(sounds a bit
> difficult to me). Does anyone knows the classes related to a load balancer?
>
> Thanks,
> Anil
>
>
> On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan <
> [EMAIL PROTECTED]> wrote:
>
>> Yes we can do this, but for it to happen you may have to have your custom
>> load balancer which will help you in getting the collocation.
>>
>> Regards
>> Ram
>>
>> > -----Original Message-----
>> > From: Jerry Lam [mailto:[EMAIL PROTECTED]]
>> > Sent: Friday, October 26, 2012 7:59 PM
>> > To: [EMAIL PROTECTED]
>> > Subject: Re: Best technique for doing lookup with Secondary Index
>> >
>> > Can we enforce 2 regions to collocate together as a logical group?
>> >
>> > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> > > https://github.com/danix800/hbase-indexed
>> > >
>> > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
>> > > [EMAIL PROTECTED]> wrote:
>> > >
>> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
>> > same
>> > > > > RS
>> > > > > since these two regions are from different table. Am i right?
>> > > >
>> > > > No... suppose your Region A and Region B of different tables are
>> > > collocated
>> > > > on same RS then from the coprocessor environment variable you can
>> > get
>> > > > access
>> > > > to the RS.
>> > > > From RS you can get the online regions and from that region object
>> > you
>> > > can
>> > > > call puts or gets.  This will not involve any RPC with in that RS
>> > because
>> > > > we
>> > > > only deal with Region objects.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: anil gupta [mailto:[EMAIL PROTECTED]]
>> > > > > Sent: Friday, October 26, 2012 12:17 PM
>> > > > > To: [EMAIL PROTECTED]
>> > > > > Subject: Re: Best technique for doing lookup with Secondary Index
>> > > > >
>> > > > > >
>> > > > > > Now your main question is lookups right
>> > > > > > Now there are some more hooks in the scan flow called
>> > > > > pre/postScannerOpen,
>> > > > > > pre/postScannerNext.
>> > > > > > May be you can try using them to do a look up on the secondary
>> > table
>> > > > > and
>> > > > > > then use those values and pass it to the main table next().
>> > > > > >
>> > > > >
>> > > > > In secondary index its hard to avoid at-least two RPC calls(1
>> > from
>> > > > > client
>> > > > > to table B and then from table B to Table A) whether you use
>> > coproc or
>> > > > > not.
>> > > > > But, i believe using coproc is better than doing RPC calls from
>> > client
>> > > > > since it might be outside the subnet/network of cluster. In this
>> > case,
>> > > > > the
>> > > > > RPC will be faster when we use coprocs. In my case the client is
>> > > > > certainly
>> > > > > not in the same subnet or network zone. I need to provide results
>> > of
>> > > > > query
>> > > > > in around 100 milliseconds or less so i need to be really frugal.
>> > Let
>> > > > > me
>> > > > > know your views on this.
>> > > > >
>> > > > > Have you implemented queries with Secondary indexes using coproc
>> > yet?
>> > > > > At present i have tried the client side query and i can get the
>> > results
>> > > > > of
>> > > > > query in around 100 ms. I am enticed to try out the coproc

Thanks & Regards,
Anil Gupta