|
anil gupta
2012-10-24, 21:40
Anoop Sam John
2012-10-25, 04:44
Ramkrishna.S.Vasudevan
2012-10-25, 05:16
anil gupta
2012-10-25, 22:10
Anoop Sam John
2012-10-26, 04:06
Ramkrishna.S.Vasudevan
2012-10-26, 04:20
Anoop Sam John
2012-10-26, 04:33
anil gupta
2012-10-26, 04:44
anil gupta
2012-10-26, 06:46
Ramkrishna.S.Vasudevan
2012-10-26, 08:13
fding hbase
2012-10-26, 10:14
Jerry Lam
2012-10-26, 14:29
Ramkrishna.S.Vasudevan
2012-10-26, 14:33
anil gupta
2012-10-26, 15:14
anil gupta
2012-10-26, 16:43
Doug Meil
2012-10-27, 00:35
|
-
Best technique for doing lookup with Secondary Indexanil gupta 2012-10-24, 21:40
Hi All,
I am using HBase 0.92.1. I have created a secondary index on table "A". Table A stores immutable data. I build the secondary table "B" using a prePut RegionObserver. The secondary index is stored in table "B" as rowkey B --> family:<rowkey A> . "<rowkey A>" is the column qualifier. Every row in B will only on have one column and the name of that column is the rowkey of A. So the value is blank. As per my understanding, accessing column qualifier is faster than accessing value. Please correct me if i am wrong. HBase Querying approach: 1. Scan the secondary table by using prefix filter and startRow. 2. Do a batch get on primary table by using HTable.get(List<Get>) method. The above approach for retrieval works fine but i was wondering it there is a better approach. I was planning to try out doing the retrieval using coprocessors. Have anyone tried using coprocessors? I would appreciate if others can share their experience with secondary index for HBase queries. -- Thanks & Regards, Anil Gupta
-
RE: Best technique for doing lookup with Secondary IndexAnoop Sam John 2012-10-25, 04:44
>I build the secondary table "B" using a prePut RegionObserver.
Anil, In prePut hook u call HTable#put()? Why use the network calls from server side here then? can not handle it from client alone? You can have a look at Lily project. Thoughts after seeing ur idea on put and scan.. -Anoop- ________________________________________ From: anil gupta [[EMAIL PROTECTED]] Sent: Thursday, October 25, 2012 3:10 AM To: [EMAIL PROTECTED] Subject: Best technique for doing lookup with Secondary Index Hi All, I am using HBase 0.92.1. I have created a secondary index on table "A". Table A stores immutable data. I build the secondary table "B" using a prePut RegionObserver. The secondary index is stored in table "B" as rowkey B --> family:<rowkey A> . "<rowkey A>" is the column qualifier. Every row in B will only on have one column and the name of that column is the rowkey of A. So the value is blank. As per my understanding, accessing column qualifier is faster than accessing value. Please correct me if i am wrong. HBase Querying approach: 1. Scan the secondary table by using prefix filter and startRow. 2. Do a batch get on primary table by using HTable.get(List<Get>) method. The above approach for retrieval works fine but i was wondering it there is a better approach. I was planning to try out doing the retrieval using coprocessors. Have anyone tried using coprocessors? I would appreciate if others can share their experience with secondary index for HBase queries. -- Thanks & Regards, Anil Gupta
-
RE: Best technique for doing lookup with Secondary IndexRamkrishna.S.Vasudevan 2012-10-25, 05:16
Just out of curiosity,
> The secondary index is stored in table "B" as rowkey B --> > family:<rowkey > A> what is rowkey B here? > 1. Scan the secondary table by using prefix filter and startRow. How is the startRow determined for every query ? Regards Ram > -----Original Message----- > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > Sent: Thursday, October 25, 2012 10:15 AM > To: [EMAIL PROTECTED] > Subject: RE: Best technique for doing lookup with Secondary Index > > >I build the secondary table "B" using a prePut RegionObserver. > > Anil, > In prePut hook u call HTable#put()? Why use the network calls > from server side here then? can not handle it from client alone? You > can have a look at Lily project. Thoughts after seeing ur idea on put > and scan.. > > -Anoop- > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Thursday, October 25, 2012 3:10 AM > To: [EMAIL PROTECTED] > Subject: Best technique for doing lookup with Secondary Index > > Hi All, > > I am using HBase 0.92.1. I have created a secondary index on table "A". > Table A stores immutable data. I build the secondary table "B" using a > prePut RegionObserver. > > The secondary index is stored in table "B" as rowkey B --> > family:<rowkey > A> . "<rowkey A>" is the column qualifier. Every row in B will only on > have one column and the name of that column is the rowkey of A. So the > value is blank. As per my understanding, accessing column qualifier is > faster than accessing value. Please correct me if i am wrong. > > > HBase Querying approach: > 1. Scan the secondary table by using prefix filter and startRow. > 2. Do a batch get on primary table by using HTable.get(List<Get>) > method. > > The above approach for retrieval works fine but i was wondering it > there is > a better approach. I was planning to try out doing the retrieval using > coprocessors. > Have anyone tried using coprocessors? I would appreciate if others can > share their experience with secondary index for HBase queries. > > -- > Thanks & Regards, > Anil Gupta
-
Re: Best technique for doing lookup with Secondary Indexanil gupta 2012-10-25, 22:10
Anoop: In prePut hook u call HTable#put()?
Anil: Yes i call HTable#put() in prePut. Is there better way of doing it? Anoop: Why use the network calls from server side here then? Anil: I thought this is a cleaner approach since i am using BulkLoader. I decided not to run two jobs since i am generating a UniqueIdentifier at runtime in bulkloader. Anoop: can not handle it from client alone? Anil: I cannot handle it from client since i am using BulkLoader. Is it a good idea to create Htable instance on "B" and do put in my mapper? I might try this idea. Anoop: You can have a look at Lily project. Anil: It's little late for us to evaluate Lily now and at present we dont need complex secondary index since our data is immutable. Ram: what is rowkey B here? Anil: Suppose i am storing customer events in table A. I have two requirement for data query: 1. Query customer events on basis of customer_Id and event_ID. 2. Query customer events on basis of event_timestamp and customer_ID. 70% of querying is done by query#1, so i will create <customer_Id><event_ID> as row key of Table A. Now, in order to support fast results for query#2, i need to create a secondary index on A. I store that secondary index in B, rowkey of B is <event_timestamp><customer_ID> .Every row stores the corresponding rowkey of A. Ram:How is the startRow determined for every query? Anil: Its determined by a very simple application logic. Thanks, Anil Gupta On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > Just out of curiosity, > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> > what is rowkey B here? > > 1. Scan the secondary table by using prefix filter and startRow. > How is the startRow determined for every query ? > > Regards > Ram > > > -----Original Message----- > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, October 25, 2012 10:15 AM > > To: [EMAIL PROTECTED] > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > Anil, > > In prePut hook u call HTable#put()? Why use the network calls > > from server side here then? can not handle it from client alone? You > > can have a look at Lily project. Thoughts after seeing ur idea on put > > and scan.. > > > > -Anoop- > > ________________________________________ > > From: anil gupta [[EMAIL PROTECTED]] > > Sent: Thursday, October 25, 2012 3:10 AM > > To: [EMAIL PROTECTED] > > Subject: Best technique for doing lookup with Secondary Index > > > > Hi All, > > > > I am using HBase 0.92.1. I have created a secondary index on table "A". > > Table A stores immutable data. I build the secondary table "B" using a > > prePut RegionObserver. > > > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> . "<rowkey A>" is the column qualifier. Every row in B will only on > > have one column and the name of that column is the rowkey of A. So the > > value is blank. As per my understanding, accessing column qualifier is > > faster than accessing value. Please correct me if i am wrong. > > > > > > HBase Querying approach: > > 1. Scan the secondary table by using prefix filter and startRow. > > 2. Do a batch get on primary table by using HTable.get(List<Get>) > > method. > > > > The above approach for retrieval works fine but i was wondering it > > there is > > a better approach. I was planning to try out doing the retrieval using > > coprocessors. > > Have anyone tried using coprocessors? I would appreciate if others can > > share their experience with secondary index for HBase queries. > > > > -- > > Thanks & Regards, > > Anil Gupta> > -- Thanks & Regards, Anil Gupta
-
RE: Best technique for doing lookup with Secondary IndexAnoop Sam John 2012-10-26, 04:06
Hi Anil,
Some confusion after seeing your reply. You use bulk loading? You created your own mapper? You call HTable#put() from mappers? I think confusion in another thread also.. I was refering to the HFileOutputReducer.. There is a TableOutputFormat also... In TableOutputFormat it will try put to the HTable... Here write to WAL is applicable... [HFileOutputReducer] : As we discussed in another thread, in case of bulk loading the aproach is like MR job create KVs and write to files and this file is written as an HFile. Yes this will contain all meta information, trailer etc... Finally only HBase cluster need to be contacted just to load this HFile(s) into HBase cluster.. Under corresponding regions. This will be the fastest way for bulk loading of huge data... -Anoop- ________________________________________ From: anil gupta [[EMAIL PROTECTED]] Sent: Friday, October 26, 2012 3:40 AM To: [EMAIL PROTECTED] Subject: Re: Best technique for doing lookup with Secondary Index Anoop: In prePut hook u call HTable#put()? Anil: Yes i call HTable#put() in prePut. Is there better way of doing it? Anoop: Why use the network calls from server side here then? Anil: I thought this is a cleaner approach since i am using BulkLoader. I decided not to run two jobs since i am generating a UniqueIdentifier at runtime in bulkloader. Anoop: can not handle it from client alone? Anil: I cannot handle it from client since i am using BulkLoader. Is it a good idea to create Htable instance on "B" and do put in my mapper? I might try this idea. Anoop: You can have a look at Lily project. Anil: It's little late for us to evaluate Lily now and at present we dont need complex secondary index since our data is immutable. Ram: what is rowkey B here? Anil: Suppose i am storing customer events in table A. I have two requirement for data query: 1. Query customer events on basis of customer_Id and event_ID. 2. Query customer events on basis of event_timestamp and customer_ID. 70% of querying is done by query#1, so i will create <customer_Id><event_ID> as row key of Table A. Now, in order to support fast results for query#2, i need to create a secondary index on A. I store that secondary index in B, rowkey of B is <event_timestamp><customer_ID> .Every row stores the corresponding rowkey of A. Ram:How is the startRow determined for every query? Anil: Its determined by a very simple application logic. Thanks, Anil Gupta On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > Just out of curiosity, > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> > what is rowkey B here? > > 1. Scan the secondary table by using prefix filter and startRow. > How is the startRow determined for every query ? > > Regards > Ram > > > -----Original Message----- > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, October 25, 2012 10:15 AM > > To: [EMAIL PROTECTED] > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > Anil, > > In prePut hook u call HTable#put()? Why use the network calls > > from server side here then? can not handle it from client alone? You > > can have a look at Lily project. Thoughts after seeing ur idea on put > > and scan.. > > > > -Anoop- > > ________________________________________ > > From: anil gupta [[EMAIL PROTECTED]] > > Sent: Thursday, October 25, 2012 3:10 AM > > To: [EMAIL PROTECTED] > > Subject: Best technique for doing lookup with Secondary Index > > > > Hi All, > > > > I am using HBase 0.92.1. I have created a secondary index on table "A". > > Table A stores immutable data. I build the secondary table "B" using a > > prePut RegionObserver. > > > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> . "<rowkey A>" is the column qualifier. Every row in B will only on Thanks & Regards, Anil Gupta
-
RE: Best technique for doing lookup with Secondary IndexRamkrishna.S.Vasudevan 2012-10-26, 04:20
> Is it a
> good idea to create Htable instance on "B" and do put in my mapper? I > might > try this idea. Yes you can do this.. May be the same mapper you can do a put for table "B". This was how we have tried loading data to another table by using the main table "A" Puts. Now your main question is lookups right Now there are some more hooks in the scan flow called pre/postScannerOpen, pre/postScannerNext. May be you can try using them to do a look up on the secondary table and then use those values and pass it to the main table next(). But this may involve more RPC calls as your regions of "A" and "B" may be in different RS. If something is wrong in my understanding of what you said, kindly spare me. :) Regards Ram > -----Original Message----- > From: anil gupta [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 26, 2012 3:40 AM > To: [EMAIL PROTECTED] > Subject: Re: Best technique for doing lookup with Secondary Index > > Anoop: In prePut hook u call HTable#put()? > Anil: Yes i call HTable#put() in prePut. Is there better way of doing > it? > > Anoop: Why use the network calls from server side here then? > Anil: I thought this is a cleaner approach since i am using BulkLoader. > I > decided not to run two jobs since i am generating a UniqueIdentifier at > runtime in bulkloader. > > Anoop: can not handle it from client alone? > Anil: I cannot handle it from client since i am using BulkLoader. Is it > a > good idea to create Htable instance on "B" and do put in my mapper? I > might > try this idea. > > Anoop: You can have a look at Lily project. > Anil: It's little late for us to evaluate Lily now and at present we > dont > need complex secondary index since our data is immutable. > > Ram: what is rowkey B here? > Anil: Suppose i am storing customer events in table A. I have two > requirement for data query: > 1. Query customer events on basis of customer_Id and event_ID. > 2. Query customer events on basis of event_timestamp and customer_ID. > > 70% of querying is done by query#1, so i will create > <customer_Id><event_ID> as row key of Table A. > Now, in order to support fast results for query#2, i need to create a > secondary index on A. I store that secondary index in B, rowkey of B is > <event_timestamp><customer_ID> .Every row stores the corresponding > rowkey > of A. > > Ram:How is the startRow determined for every query? > Anil: Its determined by a very simple application logic. > > Thanks, > Anil Gupta > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > Just out of curiosity, > > > The secondary index is stored in table "B" as rowkey B --> > > > family:<rowkey > > > A> > > what is rowkey B here? > > > 1. Scan the secondary table by using prefix filter and startRow. > > How is the startRow determined for every query ? > > > > Regards > > Ram > > > > > -----Original Message----- > > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, October 25, 2012 10:15 AM > > > To: [EMAIL PROTECTED] > > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > > > Anil, > > > In prePut hook u call HTable#put()? Why use the network > calls > > > from server side here then? can not handle it from client alone? > You > > > can have a look at Lily project. Thoughts after seeing ur idea on > put > > > and scan.. > > > > > > -Anoop- > > > ________________________________________ > > > From: anil gupta [[EMAIL PROTECTED]] > > > Sent: Thursday, October 25, 2012 3:10 AM > > > To: [EMAIL PROTECTED] > > > Subject: Best technique for doing lookup with Secondary Index > > > > > > Hi All, > > > > > > I am using HBase 0.92.1. I have created a secondary index on table > "A". > > > Table A stores immutable data. I build the secondary table "B" > using a > > > prePut RegionObserver. > > > > > > The secondary index is stored in table "B" as rowkey B -->
-
RE: Best technique for doing lookup with Secondary IndexAnoop Sam John 2012-10-26, 04:33
Anil
Have a look at MultiTableOutputFormat ( I am refering to 0.94 code base Not sure whether available in older versions) -Anoop- ________________________________________ From: Ramkrishna.S.Vasudevan [[EMAIL PROTECTED]] Sent: Friday, October 26, 2012 9:50 AM To: [EMAIL PROTECTED] Subject: RE: Best technique for doing lookup with Secondary Index > Is it a > good idea to create Htable instance on "B" and do put in my mapper? I > might > try this idea. Yes you can do this.. May be the same mapper you can do a put for table "B". This was how we have tried loading data to another table by using the main table "A" Puts. Now your main question is lookups right Now there are some more hooks in the scan flow called pre/postScannerOpen, pre/postScannerNext. May be you can try using them to do a look up on the secondary table and then use those values and pass it to the main table next(). But this may involve more RPC calls as your regions of "A" and "B" may be in different RS. If something is wrong in my understanding of what you said, kindly spare me. :) Regards Ram > -----Original Message----- > From: anil gupta [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 26, 2012 3:40 AM > To: [EMAIL PROTECTED] > Subject: Re: Best technique for doing lookup with Secondary Index > > Anoop: In prePut hook u call HTable#put()? > Anil: Yes i call HTable#put() in prePut. Is there better way of doing > it? > > Anoop: Why use the network calls from server side here then? > Anil: I thought this is a cleaner approach since i am using BulkLoader. > I > decided not to run two jobs since i am generating a UniqueIdentifier at > runtime in bulkloader. > > Anoop: can not handle it from client alone? > Anil: I cannot handle it from client since i am using BulkLoader. Is it > a > good idea to create Htable instance on "B" and do put in my mapper? I > might > try this idea. > > Anoop: You can have a look at Lily project. > Anil: It's little late for us to evaluate Lily now and at present we > dont > need complex secondary index since our data is immutable. > > Ram: what is rowkey B here? > Anil: Suppose i am storing customer events in table A. I have two > requirement for data query: > 1. Query customer events on basis of customer_Id and event_ID. > 2. Query customer events on basis of event_timestamp and customer_ID. > > 70% of querying is done by query#1, so i will create > <customer_Id><event_ID> as row key of Table A. > Now, in order to support fast results for query#2, i need to create a > secondary index on A. I store that secondary index in B, rowkey of B is > <event_timestamp><customer_ID> .Every row stores the corresponding > rowkey > of A. > > Ram:How is the startRow determined for every query? > Anil: Its determined by a very simple application logic. > > Thanks, > Anil Gupta > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > Just out of curiosity, > > > The secondary index is stored in table "B" as rowkey B --> > > > family:<rowkey > > > A> > > what is rowkey B here? > > > 1. Scan the secondary table by using prefix filter and startRow. > > How is the startRow determined for every query ? > > > > Regards > > Ram > > > > > -----Original Message----- > > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, October 25, 2012 10:15 AM > > > To: [EMAIL PROTECTED] > > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > > > Anil, > > > In prePut hook u call HTable#put()? Why use the network > calls > > > from server side here then? can not handle it from client alone? > You > > > can have a look at Lily project. Thoughts after seeing ur idea on > put > > > and scan.. > > > > > > -Anoop- > > > ________________________________________ > > > From: anil gupta [[EMAIL PROTECTED]] > > > Sent: Thursday, October 25, 2012 3:10 AM
-
Re: Best technique for doing lookup with Secondary Indexanil gupta 2012-10-26, 04:44
Hi Anoop,
Yes i use bulk loading for loading table A. I wrote my own mapper as Importtsv wont suffice my requirements. :) No, i dont call HTable#put() from my mapper. I was thinking about trying out calling HTable#put() from my mapper and see the outcome. I meant to say that when we use MR job (ex. importtsv) then WAL is not used. Sorry, if i misunderstood someone. Thanks, Anil On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Anil, > Some confusion after seeing your reply. > You use bulk loading? You created your own mapper? You call HTable#put() > from mappers? > > I think confusion in another thread also.. I was refering to the > HFileOutputReducer.. There is a TableOutputFormat also... In > TableOutputFormat it will try put to the HTable... Here write to WAL is > applicable... > > > [HFileOutputReducer] : As we discussed in another thread, in case of bulk > loading the aproach is like MR job create KVs and write to files and this > file is written as an HFile. Yes this will contain all meta information, > trailer etc... Finally only HBase cluster need to be contacted just to load > this HFile(s) into HBase cluster.. Under corresponding regions. This will > be the fastest way for bulk loading of huge data... > > > -Anoop- > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Friday, October 26, 2012 3:40 AM > To: [EMAIL PROTECTED] > Subject: Re: Best technique for doing lookup with Secondary Index > > Anoop: In prePut hook u call HTable#put()? > Anil: Yes i call HTable#put() in prePut. Is there better way of doing it? > > Anoop: Why use the network calls from server side here then? > Anil: I thought this is a cleaner approach since i am using BulkLoader. I > decided not to run two jobs since i am generating a UniqueIdentifier at > runtime in bulkloader. > > Anoop: can not handle it from client alone? > Anil: I cannot handle it from client since i am using BulkLoader. Is it a > good idea to create Htable instance on "B" and do put in my mapper? I might > try this idea. > > Anoop: You can have a look at Lily project. > Anil: It's little late for us to evaluate Lily now and at present we dont > need complex secondary index since our data is immutable. > > Ram: what is rowkey B here? > Anil: Suppose i am storing customer events in table A. I have two > requirement for data query: > 1. Query customer events on basis of customer_Id and event_ID. > 2. Query customer events on basis of event_timestamp and customer_ID. > > 70% of querying is done by query#1, so i will create > <customer_Id><event_ID> as row key of Table A. > Now, in order to support fast results for query#2, i need to create a > secondary index on A. I store that secondary index in B, rowkey of B is > <event_timestamp><customer_ID> .Every row stores the corresponding rowkey > of A. > > Ram:How is the startRow determined for every query? > Anil: Its determined by a very simple application logic. > > Thanks, > Anil Gupta > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > Just out of curiosity, > > > The secondary index is stored in table "B" as rowkey B --> > > > family:<rowkey > > > A> > > what is rowkey B here? > > > 1. Scan the secondary table by using prefix filter and startRow. > > How is the startRow determined for every query ? > > > > Regards > > Ram > > > > > -----Original Message----- > > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, October 25, 2012 10:15 AM > > > To: [EMAIL PROTECTED] > > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > > > Anil, > > > In prePut hook u call HTable#put()? Why use the network calls > > > from server side here then? can not handle it from client alone? You > > > can have a look at Lily project. Thoughts after seeing ur idea on put Thanks & Regards, Anil Gupta
-
Re: Best technique for doing lookup with Secondary Indexanil gupta 2012-10-26, 06:46
>
> Now your main question is lookups right > Now there are some more hooks in the scan flow called pre/postScannerOpen, > pre/postScannerNext. > May be you can try using them to do a look up on the secondary table and > then use those values and pass it to the main table next(). > In secondary index its hard to avoid at-least two RPC calls(1 from client to table B and then from table B to Table A) whether you use coproc or not. But, i believe using coproc is better than doing RPC calls from client since it might be outside the subnet/network of cluster. In this case, the RPC will be faster when we use coprocs. In my case the client is certainly not in the same subnet or network zone. I need to provide results of query in around 100 milliseconds or less so i need to be really frugal. Let me know your views on this. Have you implemented queries with Secondary indexes using coproc yet? At present i have tried the client side query and i can get the results of query in around 100 ms. I am enticed to try out the coproc implementation. But this may involve more RPC calls as your regions of "A" and "B" may be in > different RS. > AFAIK, RPC cannot be avoided even if Region A and Region B are on same RS since these two regions are from different table. Am i right? Thanks, Anil Gupta On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > > Is it a > > good idea to create Htable instance on "B" and do put in my mapper? I > > might > > try this idea. > Yes you can do this.. May be the same mapper you can do a put for table > "B". This was how we have tried loading data to another table by using the > main table "A" > Puts. > > Now your main question is lookups right > Now there are some more hooks in the scan flow called pre/postScannerOpen, > pre/postScannerNext. > May be you can try using them to do a look up on the secondary table and > then use those values and pass it to the main table next(). > But this may involve more RPC calls as your regions of "A" and "B" may be > in > different RS. > > If something is wrong in my understanding of what you said, kindly spare > me. > :) > > Regards > Ram > > > > -----Original Message----- > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > Sent: Friday, October 26, 2012 3:40 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > Anoop: In prePut hook u call HTable#put()? > > Anil: Yes i call HTable#put() in prePut. Is there better way of doing > > it? > > > > Anoop: Why use the network calls from server side here then? > > Anil: I thought this is a cleaner approach since i am using BulkLoader. > > I > > decided not to run two jobs since i am generating a UniqueIdentifier at > > runtime in bulkloader. > > > > Anoop: can not handle it from client alone? > > Anil: I cannot handle it from client since i am using BulkLoader. Is it > > a > > good idea to create Htable instance on "B" and do put in my mapper? I > > might > > try this idea. > > > > Anoop: You can have a look at Lily project. > > Anil: It's little late for us to evaluate Lily now and at present we > > dont > > need complex secondary index since our data is immutable. > > > > Ram: what is rowkey B here? > > Anil: Suppose i am storing customer events in table A. I have two > > requirement for data query: > > 1. Query customer events on basis of customer_Id and event_ID. > > 2. Query customer events on basis of event_timestamp and customer_ID. > > > > 70% of querying is done by query#1, so i will create > > <customer_Id><event_ID> as row key of Table A. > > Now, in order to support fast results for query#2, i need to create a > > secondary index on A. I store that secondary index in B, rowkey of B is > > <event_timestamp><customer_ID> .Every row stores the corresponding > > rowkey > > of A. > > > > Ram:How is the startRow determined for every query? > > Anil: Its determined by a very simple application logic. > > > > Thanks, Thanks & Regards, Anil Gupta
-
RE: Best technique for doing lookup with Secondary IndexRamkrishna.S.Vasudevan 2012-10-26, 08:13
> AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> RS > since these two regions are from different table. Am i right? No... suppose your Region A and Region B of different tables are collocated on same RS then from the coprocessor environment variable you can get access to the RS. >From RS you can get the online regions and from that region object you can call puts or gets. This will not involve any RPC with in that RS because we only deal with Region objects. Regards Ram > -----Original Message----- > From: anil gupta [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 26, 2012 12:17 PM > To: [EMAIL PROTECTED] > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > Now your main question is lookups right > > Now there are some more hooks in the scan flow called > pre/postScannerOpen, > > pre/postScannerNext. > > May be you can try using them to do a look up on the secondary table > and > > then use those values and pass it to the main table next(). > > > > In secondary index its hard to avoid at-least two RPC calls(1 from > client > to table B and then from table B to Table A) whether you use coproc or > not. > But, i believe using coproc is better than doing RPC calls from client > since it might be outside the subnet/network of cluster. In this case, > the > RPC will be faster when we use coprocs. In my case the client is > certainly > not in the same subnet or network zone. I need to provide results of > query > in around 100 milliseconds or less so i need to be really frugal. Let > me > know your views on this. > > Have you implemented queries with Secondary indexes using coproc yet? > At present i have tried the client side query and i can get the results > of > query in around 100 ms. I am enticed to try out the coproc > implementation. > > But this may involve more RPC calls as your regions of "A" and "B" may > be in > > different RS. > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same > RS > since these two regions are from different table. Am i right? > > > Thanks, > Anil Gupta > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > > Is it a > > > good idea to create Htable instance on "B" and do put in my mapper? > I > > > might > > > try this idea. > > Yes you can do this.. May be the same mapper you can do a put for > table > > "B". This was how we have tried loading data to another table by > using the > > main table "A" > > Puts. > > > > Now your main question is lookups right > > Now there are some more hooks in the scan flow called > pre/postScannerOpen, > > pre/postScannerNext. > > May be you can try using them to do a look up on the secondary table > and > > then use those values and pass it to the main table next(). > > But this may involve more RPC calls as your regions of "A" and "B" > may be > > in > > different RS. > > > > If something is wrong in my understanding of what you said, kindly > spare > > me. > > :) > > > > Regards > > Ram > > > > > > > -----Original Message----- > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > Sent: Friday, October 26, 2012 3:40 AM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > Anoop: In prePut hook u call HTable#put()? > > > Anil: Yes i call HTable#put() in prePut. Is there better way of > doing > > > it? > > > > > > Anoop: Why use the network calls from server side here then? > > > Anil: I thought this is a cleaner approach since i am using > BulkLoader. > > > I > > > decided not to run two jobs since i am generating a > UniqueIdentifier at > > > runtime in bulkloader. > > > > > > Anoop: can not handle it from client alone? > > > Anil: I cannot handle it from client since i am using BulkLoader. > Is it > > > a > > > good idea to create Htable instance on "B" and do put in my mapper? > I > > > might > > > try this idea. > > > > > > Anoop: You can have a look at Lily project.
-
Re: Best technique for doing lookup with Secondary Indexfding hbase 2012-10-26, 10:14
https://github.com/danix800/hbase-indexed
On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same > > RS > > since these two regions are from different table. Am i right? > > No... suppose your Region A and Region B of different tables are collocated > on same RS then from the coprocessor environment variable you can get > access > to the RS. > From RS you can get the online regions and from that region object you can > call puts or gets. This will not involve any RPC with in that RS because > we > only deal with Region objects. > > Regards > Ram > > > -----Original Message----- > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > Sent: Friday, October 26, 2012 12:17 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > Now your main question is lookups right > > > Now there are some more hooks in the scan flow called > > pre/postScannerOpen, > > > pre/postScannerNext. > > > May be you can try using them to do a look up on the secondary table > > and > > > then use those values and pass it to the main table next(). > > > > > > > In secondary index its hard to avoid at-least two RPC calls(1 from > > client > > to table B and then from table B to Table A) whether you use coproc or > > not. > > But, i believe using coproc is better than doing RPC calls from client > > since it might be outside the subnet/network of cluster. In this case, > > the > > RPC will be faster when we use coprocs. In my case the client is > > certainly > > not in the same subnet or network zone. I need to provide results of > > query > > in around 100 milliseconds or less so i need to be really frugal. Let > > me > > know your views on this. > > > > Have you implemented queries with Secondary indexes using coproc yet? > > At present i have tried the client side query and i can get the results > > of > > query in around 100 ms. I am enticed to try out the coproc > > implementation. > > > > But this may involve more RPC calls as your regions of "A" and "B" may > > be in > > > different RS. > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same > > RS > > since these two regions are from different table. Am i right? > > > > > > Thanks, > > Anil Gupta > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > > [EMAIL PROTECTED]> wrote: > > > > > > Is it a > > > > good idea to create Htable instance on "B" and do put in my mapper? > > I > > > > might > > > > try this idea. > > > Yes you can do this.. May be the same mapper you can do a put for > > table > > > "B". This was how we have tried loading data to another table by > > using the > > > main table "A" > > > Puts. > > > > > > Now your main question is lookups right > > > Now there are some more hooks in the scan flow called > > pre/postScannerOpen, > > > pre/postScannerNext. > > > May be you can try using them to do a look up on the secondary table > > and > > > then use those values and pass it to the main table next(). > > > But this may involve more RPC calls as your regions of "A" and "B" > > may be > > > in > > > different RS. > > > > > > If something is wrong in my understanding of what you said, kindly > > spare > > > me. > > > :) > > > > > > Regards > > > Ram > > > > > > > > > > -----Original Message----- > > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > > Sent: Friday, October 26, 2012 3:40 AM > > > > To: [EMAIL PROTECTED] > > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > Anoop: In prePut hook u call HTable#put()? > > > > Anil: Yes i call HTable#put() in prePut. Is there better way of > > doing > > > > it? > > > > > > > > Anoop: Why use the network calls from server side here then? > > > > Anil: I thought this is a cleaner approach since i am using > > BulkLoader. > > > > I > > > > decided not to run two jobs since i am generating a Best Regards! Fei Ding [EMAIL PROTECTED]
-
Re: Best technique for doing lookup with Secondary IndexJerry Lam 2012-10-26, 14:29
Can we enforce 2 regions to collocate together as a logical group?
On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <[EMAIL PROTECTED]> wrote: > https://github.com/danix800/hbase-indexed > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same > > > RS > > > since these two regions are from different table. Am i right? > > > > No... suppose your Region A and Region B of different tables are > collocated > > on same RS then from the coprocessor environment variable you can get > > access > > to the RS. > > From RS you can get the online regions and from that region object you > can > > call puts or gets. This will not involve any RPC with in that RS because > > we > > only deal with Region objects. > > > > Regards > > Ram > > > > > -----Original Message----- > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > Sent: Friday, October 26, 2012 12:17 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > > > > Now your main question is lookups right > > > > Now there are some more hooks in the scan flow called > > > pre/postScannerOpen, > > > > pre/postScannerNext. > > > > May be you can try using them to do a look up on the secondary table > > > and > > > > then use those values and pass it to the main table next(). > > > > > > > > > > In secondary index its hard to avoid at-least two RPC calls(1 from > > > client > > > to table B and then from table B to Table A) whether you use coproc or > > > not. > > > But, i believe using coproc is better than doing RPC calls from client > > > since it might be outside the subnet/network of cluster. In this case, > > > the > > > RPC will be faster when we use coprocs. In my case the client is > > > certainly > > > not in the same subnet or network zone. I need to provide results of > > > query > > > in around 100 milliseconds or less so i need to be really frugal. Let > > > me > > > know your views on this. > > > > > > Have you implemented queries with Secondary indexes using coproc yet? > > > At present i have tried the client side query and i can get the results > > > of > > > query in around 100 ms. I am enticed to try out the coproc > > > implementation. > > > > > > But this may involve more RPC calls as your regions of "A" and "B" may > > > be in > > > > different RS. > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same > > > RS > > > since these two regions are from different table. Am i right? > > > > > > > > > Thanks, > > > Anil Gupta > > > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > > > [EMAIL PROTECTED]> wrote: > > > > > > > > Is it a > > > > > good idea to create Htable instance on "B" and do put in my mapper? > > > I > > > > > might > > > > > try this idea. > > > > Yes you can do this.. May be the same mapper you can do a put for > > > table > > > > "B". This was how we have tried loading data to another table by > > > using the > > > > main table "A" > > > > Puts. > > > > > > > > Now your main question is lookups right > > > > Now there are some more hooks in the scan flow called > > > pre/postScannerOpen, > > > > pre/postScannerNext. > > > > May be you can try using them to do a look up on the secondary table > > > and > > > > then use those values and pass it to the main table next(). > > > > But this may involve more RPC calls as your regions of "A" and "B" > > > may be > > > > in > > > > different RS. > > > > > > > > If something is wrong in my understanding of what you said, kindly > > > spare > > > > me. > > > > :) > > > > > > > > Regards > > > > Ram > > > > > > > > > > > > > -----Original Message----- > > > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > > > Sent: Friday, October 26, 2012 3:40 AM > > > > > To: [EMAIL PROTECTED] > > > > > Subject: Re: Best technique for doing lookup with Secondary Index
-
RE: Best technique for doing lookup with Secondary IndexRamkrishna.S.Vasudevan 2012-10-26, 14:33
Yes we can do this, but for it to happen you may have to have your custom
load balancer which will help you in getting the collocation. Regards Ram > -----Original Message----- > From: Jerry Lam [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 26, 2012 7:59 PM > To: [EMAIL PROTECTED] > Subject: Re: Best technique for doing lookup with Secondary Index > > Can we enforce 2 regions to collocate together as a logical group? > > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <[EMAIL PROTECTED]> > wrote: > > > https://github.com/danix800/hbase-indexed > > > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < > > [EMAIL PROTECTED]> wrote: > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > same > > > > RS > > > > since these two regions are from different table. Am i right? > > > > > > No... suppose your Region A and Region B of different tables are > > collocated > > > on same RS then from the coprocessor environment variable you can > get > > > access > > > to the RS. > > > From RS you can get the online regions and from that region object > you > > can > > > call puts or gets. This will not involve any RPC with in that RS > because > > > we > > > only deal with Region objects. > > > > > > Regards > > > Ram > > > > > > > -----Original Message----- > > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > > Sent: Friday, October 26, 2012 12:17 PM > > > > To: [EMAIL PROTECTED] > > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > > > > > > > Now your main question is lookups right > > > > > Now there are some more hooks in the scan flow called > > > > pre/postScannerOpen, > > > > > pre/postScannerNext. > > > > > May be you can try using them to do a look up on the secondary > table > > > > and > > > > > then use those values and pass it to the main table next(). > > > > > > > > > > > > > In secondary index its hard to avoid at-least two RPC calls(1 > from > > > > client > > > > to table B and then from table B to Table A) whether you use > coproc or > > > > not. > > > > But, i believe using coproc is better than doing RPC calls from > client > > > > since it might be outside the subnet/network of cluster. In this > case, > > > > the > > > > RPC will be faster when we use coprocs. In my case the client is > > > > certainly > > > > not in the same subnet or network zone. I need to provide results > of > > > > query > > > > in around 100 milliseconds or less so i need to be really frugal. > Let > > > > me > > > > know your views on this. > > > > > > > > Have you implemented queries with Secondary indexes using coproc > yet? > > > > At present i have tried the client side query and i can get the > results > > > > of > > > > query in around 100 ms. I am enticed to try out the coproc > > > > implementation. > > > > > > > > But this may involve more RPC calls as your regions of "A" and > "B" may > > > > be in > > > > > different RS. > > > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > same > > > > RS > > > > since these two regions are from different table. Am i right? > > > > > > > > > > > > Thanks, > > > > Anil Gupta > > > > > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > > > Is it a > > > > > > good idea to create Htable instance on "B" and do put in my > mapper? > > > > I > > > > > > might > > > > > > try this idea. > > > > > Yes you can do this.. May be the same mapper you can do a put > for > > > > table > > > > > "B". This was how we have tried loading data to another table > by > > > > using the > > > > > main table "A" > > > > > Puts. > > > > > > > > > > Now your main question is lookups right > > > > > Now there are some more hooks in the scan flow called > > > > pre/postScannerOpen, > > > > > pre/postScannerNext. > > > > > May be you can try using them to do a look up on the secondary > table > > > > and > > > > > then use those values and pass it to the main table next().
-
Re: Best technique for doing lookup with Secondary Indexanil gupta 2012-10-26, 15:14
@fding hbase: thanks for the link. I'll look into it.
Interesting to know that within a region server we dont need a RPC call. If we can collocate two regions(or more) then that is the best solution. I am not sure how hard it'll be to write a custom load balancer(sounds a bit difficult to me). Does anyone knows the classes related to a load balancer? Thanks, Anil On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > Yes we can do this, but for it to happen you may have to have your custom > load balancer which will help you in getting the collocation. > > Regards > Ram > > > -----Original Message----- > > From: Jerry Lam [mailto:[EMAIL PROTECTED]] > > Sent: Friday, October 26, 2012 7:59 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > Can we enforce 2 regions to collocate together as a logical group? > > > > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <[EMAIL PROTECTED]> > > wrote: > > > > > https://github.com/danix800/hbase-indexed > > > > > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < > > > [EMAIL PROTECTED]> wrote: > > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > > same > > > > > RS > > > > > since these two regions are from different table. Am i right? > > > > > > > > No... suppose your Region A and Region B of different tables are > > > collocated > > > > on same RS then from the coprocessor environment variable you can > > get > > > > access > > > > to the RS. > > > > From RS you can get the online regions and from that region object > > you > > > can > > > > call puts or gets. This will not involve any RPC with in that RS > > because > > > > we > > > > only deal with Region objects. > > > > > > > > Regards > > > > Ram > > > > > > > > > -----Original Message----- > > > > > From: anil gupta [mailto:[EMAIL PROTECTED]] > > > > > Sent: Friday, October 26, 2012 12:17 PM > > > > > To: [EMAIL PROTECTED] > > > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > > > > > > > > > > Now your main question is lookups right > > > > > > Now there are some more hooks in the scan flow called > > > > > pre/postScannerOpen, > > > > > > pre/postScannerNext. > > > > > > May be you can try using them to do a look up on the secondary > > table > > > > > and > > > > > > then use those values and pass it to the main table next(). > > > > > > > > > > > > > > > > In secondary index its hard to avoid at-least two RPC calls(1 > > from > > > > > client > > > > > to table B and then from table B to Table A) whether you use > > coproc or > > > > > not. > > > > > But, i believe using coproc is better than doing RPC calls from > > client > > > > > since it might be outside the subnet/network of cluster. In this > > case, > > > > > the > > > > > RPC will be faster when we use coprocs. In my case the client is > > > > > certainly > > > > > not in the same subnet or network zone. I need to provide results > > of > > > > > query > > > > > in around 100 milliseconds or less so i need to be really frugal. > > Let > > > > > me > > > > > know your views on this. > > > > > > > > > > Have you implemented queries with Secondary indexes using coproc > > yet? > > > > > At present i have tried the client side query and i can get the > > results > > > > > of > > > > > query in around 100 ms. I am enticed to try out the coproc > > > > > implementation. > > > > > > > > > > But this may involve more RPC calls as your regions of "A" and > > "B" may > > > > > be in > > > > > > different RS. > > > > > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > > same > > > > > RS > > > > > since these two regions are from different table. Am i right? > > > > > > > > > > > > > > > Thanks, > > > > > Anil Gupta > > > > > > > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > > > > > [EMAIL PROTECTED]> wrote Thanks & Regards, Anil Gupta
-
Re: Best technique for doing lookup with Secondary Indexanil gupta 2012-10-26, 16:43
Hi Danis,
I downloaded the zip file and copied the source code to my HBase0.92.1 project. It compiled successfully. I am going through the source code right now. Is it possible for you to provide a architecture diagram for you implementation?comments in code? It will be easier for users to understand you implementation quickly. Thanks, Anil Gupta On Fri, Oct 26, 2012 at 8:14 AM, anil gupta <[EMAIL PROTECTED]> wrote: > @fding hbase: thanks for the link. I'll look into it. > > Interesting to know that within a region server we dont need a RPC call. > If we can collocate two regions(or more) then that is the best solution. I > am not sure how hard it'll be to write a custom load balancer(sounds a bit > difficult to me). Does anyone knows the classes related to a load balancer? > > Thanks, > Anil > > > On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > >> Yes we can do this, but for it to happen you may have to have your custom >> load balancer which will help you in getting the collocation. >> >> Regards >> Ram >> >> > -----Original Message----- >> > From: Jerry Lam [mailto:[EMAIL PROTECTED]] >> > Sent: Friday, October 26, 2012 7:59 PM >> > To: [EMAIL PROTECTED] >> > Subject: Re: Best technique for doing lookup with Secondary Index >> > >> > Can we enforce 2 regions to collocate together as a logical group? >> > >> > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <[EMAIL PROTECTED]> >> > wrote: >> > >> > > https://github.com/danix800/hbase-indexed >> > > >> > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < >> > > [EMAIL PROTECTED]> wrote: >> > > >> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on >> > same >> > > > > RS >> > > > > since these two regions are from different table. Am i right? >> > > > >> > > > No... suppose your Region A and Region B of different tables are >> > > collocated >> > > > on same RS then from the coprocessor environment variable you can >> > get >> > > > access >> > > > to the RS. >> > > > From RS you can get the online regions and from that region object >> > you >> > > can >> > > > call puts or gets. This will not involve any RPC with in that RS >> > because >> > > > we >> > > > only deal with Region objects. >> > > > >> > > > Regards >> > > > Ram >> > > > >> > > > > -----Original Message----- >> > > > > From: anil gupta [mailto:[EMAIL PROTECTED]] >> > > > > Sent: Friday, October 26, 2012 12:17 PM >> > > > > To: [EMAIL PROTECTED] >> > > > > Subject: Re: Best technique for doing lookup with Secondary Index >> > > > > >> > > > > > >> > > > > > Now your main question is lookups right >> > > > > > Now there are some more hooks in the scan flow called >> > > > > pre/postScannerOpen, >> > > > > > pre/postScannerNext. >> > > > > > May be you can try using them to do a look up on the secondary >> > table >> > > > > and >> > > > > > then use those values and pass it to the main table next(). >> > > > > > >> > > > > >> > > > > In secondary index its hard to avoid at-least two RPC calls(1 >> > from >> > > > > client >> > > > > to table B and then from table B to Table A) whether you use >> > coproc or >> > > > > not. >> > > > > But, i believe using coproc is better than doing RPC calls from >> > client >> > > > > since it might be outside the subnet/network of cluster. In this >> > case, >> > > > > the >> > > > > RPC will be faster when we use coprocs. In my case the client is >> > > > > certainly >> > > > > not in the same subnet or network zone. I need to provide results >> > of >> > > > > query >> > > > > in around 100 milliseconds or less so i need to be really frugal. >> > Let >> > > > > me >> > > > > know your views on this. >> > > > > >> > > > > Have you implemented queries with Secondary indexes using coproc >> > yet? >> > > > > At present i have tried the client side query and i can get the >> > results >> > > > > of >> > > > > query in around 100 ms. I am enticed to try out the coproc Thanks & Regards, Anil Gupta
-
Re: Best technique for doing lookup with Secondary IndexDoug Meil 2012-10-27, 00:35
Hey folks, for the record there are samples of using importsv for preparing Hfiles in here... http://hbase.apache.org/book.html#importtsv On 10/26/12 12:44 AM, "anil gupta" <[EMAIL PROTECTED]> wrote: >Hi Anoop, > >Yes i use bulk loading for loading table A. I wrote my own mapper as >Importtsv wont suffice my requirements. :) No, i dont call HTable#put() >from my mapper. I was thinking about trying out calling HTable#put() from >my mapper and see the outcome. > > I meant to say that when we use MR job (ex. importtsv) then WAL is not >used. Sorry, if i misunderstood someone. > >Thanks, >Anil > >On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <[EMAIL PROTECTED]> >wrote: > >> Hi Anil, >> Some confusion after seeing your reply. >> You use bulk loading? You created your own mapper? You call >>HTable#put() >> from mappers? >> >> I think confusion in another thread also.. I was refering to the >> HFileOutputReducer.. There is a TableOutputFormat also... In >> TableOutputFormat it will try put to the HTable... Here write to WAL is >> applicable... >> >> >> [HFileOutputReducer] : As we discussed in another thread, in case of >>bulk >> loading the aproach is like MR job create KVs and write to files and >>this >> file is written as an HFile. Yes this will contain all meta information, >> trailer etc... Finally only HBase cluster need to be contacted just to >>load >> this HFile(s) into HBase cluster.. Under corresponding regions. This >>will >> be the fastest way for bulk loading of huge data... >> >> >> -Anoop- >> ________________________________________ >> From: anil gupta [[EMAIL PROTECTED]] >> Sent: Friday, October 26, 2012 3:40 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Best technique for doing lookup with Secondary Index >> >> Anoop: In prePut hook u call HTable#put()? >> Anil: Yes i call HTable#put() in prePut. Is there better way of doing >>it? >> >> Anoop: Why use the network calls from server side here then? >> Anil: I thought this is a cleaner approach since i am using BulkLoader. >>I >> decided not to run two jobs since i am generating a UniqueIdentifier at >> runtime in bulkloader. >> >> Anoop: can not handle it from client alone? >> Anil: I cannot handle it from client since i am using BulkLoader. Is it >>a >> good idea to create Htable instance on "B" and do put in my mapper? I >>might >> try this idea. >> >> Anoop: You can have a look at Lily project. >> Anil: It's little late for us to evaluate Lily now and at present we >>dont >> need complex secondary index since our data is immutable. >> >> Ram: what is rowkey B here? >> Anil: Suppose i am storing customer events in table A. I have two >> requirement for data query: >> 1. Query customer events on basis of customer_Id and event_ID. >> 2. Query customer events on basis of event_timestamp and customer_ID. >> >> 70% of querying is done by query#1, so i will create >> <customer_Id><event_ID> as row key of Table A. >> Now, in order to support fast results for query#2, i need to create a >> secondary index on A. I store that secondary index in B, rowkey of B is >> <event_timestamp><customer_ID> .Every row stores the corresponding >>rowkey >> of A. >> >> Ram:How is the startRow determined for every query? >> Anil: Its determined by a very simple application logic. >> >> Thanks, >> Anil Gupta >> >> On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < >> [EMAIL PROTECTED]> wrote: >> >> > Just out of curiosity, >> > > The secondary index is stored in table "B" as rowkey B --> >> > > family:<rowkey >> > > A> >> > what is rowkey B here? >> > > 1. Scan the secondary table by using prefix filter and startRow. >> > How is the startRow determined for every query ? >> > >> > Regards >> > Ram >> > >> > > -----Original Message----- >> > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] >> > > Sent: Thursday, October 25, 2012 10:15 AM >> > > To: [EMAIL PROTECTED] >> > > Subject: RE: Best technique for doing lookup with Secondary Index |