|
Anoop Sam John
2012-12-04, 08:10
Jan Van Besien
2012-12-04, 19:24
Anoop Sam John
2012-12-05, 11:04
ramkrishna vasudevan
2012-12-05, 11:28
ramkrishna vasudevan
2012-12-05, 11:40
Jan Van Besien
2012-12-05, 15:12
ramkrishna vasudevan
2012-12-05, 17:42
Andrey Stepachev
2012-12-05, 11:52
Anoop Sam John
2012-12-05, 12:55
Andrey Stepachev
2012-12-05, 15:59
Anoop John
2012-12-05, 17:54
Jonathan Hsieh
2012-12-05, 19:23
lars hofhansl
2012-12-05, 21:03
Andrew Purtell
2012-12-06, 04:41
Anoop Sam John
2012-12-06, 04:57
ramkrishna vasudevan
2012-12-06, 13:05
Doug Meil
2012-12-05, 21:58
Anoop John
2012-12-06, 01:17
Nick Dimiduk
2012-12-18, 17:48
Andrew Purtell
2012-12-19, 00:51
ramkrishna vasudevan
2012-12-19, 04:24
|
-
HBase - Secondary IndexAnoop Sam John 2012-12-04, 08:10
Hi All
Last week I got a chance to present the secondary indexing solution what we have done in Huawei at the China Hadoop Conference. You can see the presentation from http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf I would like to hear what others think on this. :) -Anoop- +
Anoop Sam John 2012-12-04, 08:10
-
Re: HBase - Secondary IndexJan Van Besien 2012-12-04, 19:24
Hi,
On 12/04/2012 09:10 AM, Anoop Sam John wrote: > I would like to hear what others think on this. :) I found it interesting to read your approach on how the indexes can be used to speed up existing scan operations. I couldn't find anything in your presentation though about whether your implementation makes any guarantees to ensure the source table and the index table are always (eventually) in sync. What if data is inserted in the source table and then the region server crashes (before the coprocessor is executed)? Will the index be out of sync? Do you have a mechanisme in place to detect and restore this situation? thanks Jan +
Jan Van Besien 2012-12-04, 19:24
-
RE: HBase - Secondary IndexAnoop Sam John 2012-12-05, 11:04
Hi Jan
Yes we guarentee the consistency between user table and index table. The put operation will be handled as a transactional way so as to make sure the data is added to both tables or reverted back from both. Some new CP hooks we have added for this obviously. -Anoop- ________________________________________ From: Jan Van Besien [[EMAIL PROTECTED]] Sent: Wednesday, December 05, 2012 12:54 AM To: [EMAIL PROTECTED] Subject: Re: HBase - Secondary Index Hi, On 12/04/2012 09:10 AM, Anoop Sam John wrote: > I would like to hear what others think on this. :) I found it interesting to read your approach on how the indexes can be used to speed up existing scan operations. I couldn't find anything in your presentation though about whether your implementation makes any guarantees to ensure the source table and the index table are always (eventually) in sync. What if data is inserted in the source table and then the region server crashes (before the coprocessor is executed)? Will the index be out of sync? Do you have a mechanisme in place to detect and restore this situation? thanks Jan +
Anoop Sam John 2012-12-05, 11:04
-
Re: HBase - Secondary Indexramkrishna vasudevan 2012-12-05, 11:28
Thanks Anoop for the reply. I was planning to reply if not from you.
Regards Raml On Wed, Dec 5, 2012 at 4:34 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Jan > Yes we guarentee the consistency between user table and index > table. The put operation will be handled as a transactional way so as to > make sure the data is added to both tables or reverted back from both. Some > new CP hooks we have added for this obviously. > > -Anoop- > ________________________________________ > From: Jan Van Besien [[EMAIL PROTECTED]] > Sent: Wednesday, December 05, 2012 12:54 AM > To: [EMAIL PROTECTED] > Subject: Re: HBase - Secondary Index > > Hi, > > On 12/04/2012 09:10 AM, Anoop Sam John wrote: > > I would like to hear what others think on this. :) > > I found it interesting to read your approach on how the indexes can be > used to speed up existing scan operations. > > I couldn't find anything in your presentation though about whether your > implementation makes any guarantees to ensure the source table and the > index table are always (eventually) in sync. > > What if data is inserted in the source table and then the region server > crashes (before the coprocessor is executed)? Will the index be out of > sync? Do you have a mechanisme in place to detect and restore this > situation? > > thanks > Jan > +
ramkrishna vasudevan 2012-12-05, 11:28
-
Re: HBase - Secondary Indexramkrishna vasudevan 2012-12-05, 11:40
Also the WalObserver hooks were also used to ensure that the append to WAL
happens thro them. Regards Ram On Wed, Dec 5, 2012 at 4:58 PM, ramkrishna vasudevan < [EMAIL PROTECTED]> wrote: > Thanks Anoop for the reply. I was planning to reply if not from you. > > Regards > Raml > > > On Wed, Dec 5, 2012 at 4:34 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > >> Hi Jan >> Yes we guarentee the consistency between user table and index >> table. The put operation will be handled as a transactional way so as to >> make sure the data is added to both tables or reverted back from both. Some >> new CP hooks we have added for this obviously. >> >> -Anoop- >> ________________________________________ >> From: Jan Van Besien [[EMAIL PROTECTED]] >> Sent: Wednesday, December 05, 2012 12:54 AM >> To: [EMAIL PROTECTED] >> Subject: Re: HBase - Secondary Index >> >> Hi, >> >> On 12/04/2012 09:10 AM, Anoop Sam John wrote: >> > I would like to hear what others think on this. :) >> >> I found it interesting to read your approach on how the indexes can be >> used to speed up existing scan operations. >> >> I couldn't find anything in your presentation though about whether your >> implementation makes any guarantees to ensure the source table and the >> index table are always (eventually) in sync. >> >> What if data is inserted in the source table and then the region server >> crashes (before the coprocessor is executed)? Will the index be out of >> sync? Do you have a mechanisme in place to detect and restore this >> situation? >> >> thanks >> Jan >> > > +
ramkrishna vasudevan 2012-12-05, 11:40
-
Re: HBase - Secondary IndexJan Van Besien 2012-12-05, 15:12
On 12/05/2012 12:04 PM, Anoop Sam John wrote:
> Yes we guarentee the consistency between user table and index table. The put operation will be handled as a transactional way so as to make sure the data is added to both tables or reverted back from both. Some new CP hooks we have added for this obviously. Would you be interested in sharing how exactly you guarantee this consistency? What are the CP hooks that you added and how exactly are they used? I can currently only guess how it could work in your implementation. For examply it could be that an update is a single WALEdit, which results in an update to both the source and index table. If the region server crashes between the update to the source and the index table, the HLog will be replied and thus you will have a chance to recover. However if the update to the index table (after a succesful update of the source table) fails for some other reason (without a crash of the region server), the HLog will not be replayed.. Anyway, the above is just one assumption of your implementation could work. If you could share more details of the actual implementation, this would be helpful. Thanks Jan +
Jan Van Besien 2012-12-05, 15:12
-
Re: HBase - Secondary Indexramkrishna vasudevan 2012-12-05, 17:42
I would like to reply to this...As previously i was part of the impl of
this..Hope Anoop corrects me if i am going wrong here.. . However if the update to the index table (after a succesful update of the source table) fails for some other reason (without a crash of the region server), the HLog will not be replayed.. As you understood the WALEdit is done for the index region and also for the main region using WAL hooks. The next step includes the addition to the memstore.. So the KVs needs to be added to the memstore of both the main region and index region. What type of failure do you foresee here? You think of flushes or something that could fail? If you see the Put() api code once the WAL edit is successfull there is no need to rollback also. Just after the memstore addition happens for the main table new hooks were added to make an entry for the index table. Ideally here both should pass. Also some addtional work was done inorder to take into account the MVCC part. So that a flush of the main region or index region does not affect an incoming put or vice versa. Anyway Anoop can answer to this more specifically as i dont have access to the src code anymore. Regards Ram On Wed, Dec 5, 2012 at 8:42 PM, Jan Van Besien <[EMAIL PROTECTED]> wrote: > On 12/05/2012 12:04 PM, Anoop Sam John wrote: > >> Yes we guarentee the consistency between user table and index >> table. The put operation will be handled as a transactional way so as to >> make sure the data is added to both tables or reverted back from both. Some >> new CP hooks we have added for this obviously. >> > > Would you be interested in sharing how exactly you guarantee this > consistency? What are the CP hooks that you added and how exactly are they > used? > > I can currently only guess how it could work in your implementation. For > examply it could be that an update is a single WALEdit, which results in an > update to both the source and index table. If the region server crashes > between the update to the source and the index table, the HLog will be > replied and thus you will have a chance to recover. However if the update > to the index table (after a succesful update of the source table) fails for > some other reason (without a crash of the region server), the HLog will not > be replayed.. > > > Anyway, the above is just one assumption of your implementation could > work. If you could share more details of the actual implementation, this > would be helpful. > > Thanks > Jan > +
ramkrishna vasudevan 2012-12-05, 17:42
-
Re: HBase - Secondary IndexAndrey Stepachev 2012-12-05, 11:52
Hi.
Indexing solution looks tempting. Are there any plans to open source your solution (or it already open and I can't find it?). On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi All > > Last week I got a chance to present the secondary indexing > solution what we have done in Huawei at the China Hadoop Conference. You > can see the presentation from > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > I would like to hear what others think on this. :) > > > > -Anoop- > -- Andrey. +
Andrey Stepachev 2012-12-05, 11:52
-
RE: HBase - Secondary IndexAnoop Sam John 2012-12-05, 12:55
No this is not open sourced yet.. As per the interest from the HBase community we can think of contributing..
It is time to see HBase community version with sec indexing in it (IMHO) :) -Anoop- ________________________________________ From: Andrey Stepachev [[EMAIL PROTECTED]] Sent: Wednesday, December 05, 2012 5:22 PM To: [EMAIL PROTECTED] Subject: Re: HBase - Secondary Index Hi. Indexing solution looks tempting. Are there any plans to open source your solution (or it already open and I can't find it?). On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi All > > Last week I got a chance to present the secondary indexing > solution what we have done in Huawei at the China Hadoop Conference. You > can see the presentation from > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > I would like to hear what others think on this. :) > > > > -Anoop- > -- Andrey. +
Anoop Sam John 2012-12-05, 12:55
-
Re: HBase - Secondary IndexAndrey Stepachev 2012-12-05, 15:59
Can you explain, what you mean under 'HBase community version with sec
indexing in it'. You will wait, until someone implements the same algorithm in trunk hbase, or? On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > No this is not open sourced yet.. As per the interest from the HBase > community we can think of contributing.. > It is time to see HBase community version with sec indexing in it (IMHO) > :) > > -Anoop- > > ________________________________________ > From: Andrey Stepachev [[EMAIL PROTECTED]] > Sent: Wednesday, December 05, 2012 5:22 PM > To: [EMAIL PROTECTED] > Subject: Re: HBase - Secondary Index > > Hi. > > Indexing solution looks tempting. > Are there any plans to open source your solution (or it already open and I > can't find it?). > > > > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > Hi All > > > > Last week I got a chance to present the secondary indexing > > solution what we have done in Huawei at the China Hadoop Conference. You > > can see the presentation from > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > -Anoop- > > > > > > -- > Andrey. > -- Andrey. +
Andrey Stepachev 2012-12-05, 15:59
-
Re: HBase - Secondary IndexAnoop John 2012-12-05, 17:54
I mean HBase devs to work on having sec indexing available with the HBase
distribution... Now I guess many users of HBase implement different kinds of sec indexing ..:) We @Huawei would be happy to provide our support in it as per the interest from the community.. :) -Anoop- On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote: > Can you explain, what you mean under 'HBase community version with sec > indexing in it'. You will wait, until someone implements the same algorithm > in trunk hbase, or? > > > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > > > No this is not open sourced yet.. As per the interest from the HBase > > community we can think of contributing.. > > It is time to see HBase community version with sec indexing in it (IMHO) > > :) > > > > -Anoop- > > > > ________________________________________ > > From: Andrey Stepachev [[EMAIL PROTECTED]] > > Sent: Wednesday, December 05, 2012 5:22 PM > > To: [EMAIL PROTECTED] > > Subject: Re: HBase - Secondary Index > > > > Hi. > > > > Indexing solution looks tempting. > > Are there any plans to open source your solution (or it already open and > I > > can't find it?). > > > > > > > > > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> > > wrote: > > > > > Hi All > > > > > > Last week I got a chance to present the secondary indexing > > > solution what we have done in Huawei at the China Hadoop Conference. > You > > > can see the presentation from > > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > > > > > -Anoop- > > > > > > > > > > > -- > > Andrey. > > > > > > -- > Andrey. > +
Anoop John 2012-12-05, 17:54
-
Re: HBase - Secondary IndexJonathan Hsieh 2012-12-05, 19:23
I personally feel that we should only add the primitive apis necessary to
make this work (and if non are required, all the better!), and to try to keep secondary indexing work it as a separate project on top of hbase because there are many possible valid architectures and implementations. I have two question areas -- one touched upon in previous messages -- what logging and how do we get consistency guarantee's do we get bewteen the index and primary? The other has to do with scalability. I'm not sure I interpreted the slides correctly, but from the slides 8 and 13, is the architecture such that each primary table region has a corresponding index table region? Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 rs's? What happens if we go up to 20, or 100 rs's? If I'm right about the per index region per table region, I have a feeling this isn't going to scale well with a large number of regions (since it would potentially have to talk to each region and essentially every region server). Jon. On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > I mean HBase devs to work on having sec indexing available with the HBase > distribution... Now I guess many users of HBase implement different kinds > of sec indexing ..:) > > We @Huawei would be happy to provide our support in it as per the interest > from the community.. :) > > -Anoop- > > On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote: > > > Can you explain, what you mean under 'HBase community version with sec > > indexing in it'. You will wait, until someone implements the same > algorithm > > in trunk hbase, or? > > > > > > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > > > No this is not open sourced yet.. As per the interest from the HBase > > > community we can think of contributing.. > > > It is time to see HBase community version with sec indexing in it > (IMHO) > > > :) > > > > > > -Anoop- > > > > > > ________________________________________ > > > From: Andrey Stepachev [[EMAIL PROTECTED]] > > > Sent: Wednesday, December 05, 2012 5:22 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: HBase - Secondary Index > > > > > > Hi. > > > > > > Indexing solution looks tempting. > > > Are there any plans to open source your solution (or it already open > and > > I > > > can't find it?). > > > > > > > > > > > > > > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi All > > > > > > > > Last week I got a chance to present the secondary > indexing > > > > solution what we have done in Huawei at the China Hadoop Conference. > > You > > > > can see the presentation from > > > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > > > > > > > > > -Anoop- > > > > > > > > > > > > > > > > -- > > > Andrey. > > > > > > > > > > > -- > > Andrey. > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
Jonathan Hsieh 2012-12-05, 19:23
-
Re: HBase - Secondary Indexlars hofhansl 2012-12-05, 21:03
I'd say "it depends".
It seems everybody wants secondary indexes in HBase. The problem is that most folks don't agree what that actually means. The most interesting problem and discussion point (IMHO) is that HBase would need some form of schema description. See also HBASE-7221. Assuming we had something like that, it could be a building block for a simple built-in secondary index solution. In the end you're probably right and we cannot prescribe a single secondary index solution that will fit all use cases. -- Lars ----- Original Message ----- From: Jonathan Hsieh <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Wednesday, December 5, 2012 11:23 AM Subject: Re: HBase - Secondary Index I personally feel that we should only add the primitive apis necessary to make this work (and if non are required, all the better!), and to try to keep secondary indexing work it as a separate project on top of hbase because there are many possible valid architectures and implementations. I have two question areas -- one touched upon in previous messages -- what logging and how do we get consistency guarantee's do we get bewteen the index and primary? The other has to do with scalability. I'm not sure I interpreted the slides correctly, but from the slides 8 and 13, is the architecture such that each primary table region has a corresponding index table region? Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 rs's? What happens if we go up to 20, or 100 rs's? If I'm right about the per index region per table region, I have a feeling this isn't going to scale well with a large number of regions (since it would potentially have to talk to each region and essentially every region server). Jon. On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > I mean HBase devs to work on having sec indexing available with the HBase > distribution... Now I guess many users of HBase implement different kinds > of sec indexing ..:) > > We @Huawei would be happy to provide our support in it as per the interest > from the community.. :) > > -Anoop- > > On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote: > > > Can you explain, what you mean under 'HBase community version with sec > > indexing in it'. You will wait, until someone implements the same > algorithm > > in trunk hbase, or? > > > > > > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > > > No this is not open sourced yet.. As per the interest from the HBase > > > community we can think of contributing.. > > > It is time to see HBase community version with sec indexing in it > (IMHO) > > > :) > > > > > > -Anoop- > > > > > > ________________________________________ > > > From: Andrey Stepachev [[EMAIL PROTECTED]] > > > Sent: Wednesday, December 05, 2012 5:22 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: HBase - Secondary Index > > > > > > Hi. > > > > > > Indexing solution looks tempting. > > > Are there any plans to open source your solution (or it already open > and > > I > > > can't find it?). > > > > > > > > > > > > > > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi All > > > > > > > > Last week I got a chance to present the secondary > indexing > > > > solution what we have done in Huawei at the China Hadoop Conference. > > You > > > > can see the presentation from > > > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > > > > > > > > > -Anoop- > > > > > > > > > > > > > > > > -- > > > Andrey. > > > > > > > > > > > -- > > Andrey. > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
lars hofhansl 2012-12-05, 21:03
-
Re: HBase - Secondary IndexAndrew Purtell 2012-12-06, 04:41
We won't find a secondary indexing scheme for HBase that will fit all
use cases. However I am becoming convinced we should distribute as part of the distribution something good enough for some subset of simple/common use cases, so users have *something* to try, and can play around it. Everyone asks for it. All the time. I'd guess that would look like a coprocessor based solution, so there's no cost to those who don't want to use it, and minimal changes to core code. Maybe it would suit them, or if not at least they will have enough experience having played around with it to decide why they might need to do something else and what that might look like for their use case. On 12/6/12, lars hofhansl <[EMAIL PROTECTED]> wrote: > I'd say "it depends". > It seems everybody wants secondary indexes in HBase. The problem is that > most folks don't agree what that actually means. > > The most interesting problem and discussion point (IMHO) is that HBase would > need some form of schema description. > > See also HBASE-7221. Assuming we had something like that, it could be a > building block for a simple built-in secondary index solution. > > In the end you're probably right and we cannot prescribe a single secondary > index solution that will fit all use cases. > > -- Lars > > > ----- Original Message ----- > From: Jonathan Hsieh <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Wednesday, December 5, 2012 11:23 AM > Subject: Re: HBase - Secondary Index > > I personally feel that we should only add the primitive apis necessary to > make this work (and if non are required, all the better!), and to try to > keep secondary indexing work it as a separate project on top of hbase > because there are many possible valid architectures and implementations. > > I have two question areas -- one touched upon in previous messages -- what > logging and how do we get consistency guarantee's do we get bewteen the > index and primary? > > The other has to do with scalability. I'm not sure I interpreted the slides > correctly, but from the slides 8 and 13, is the architecture such that each > primary table region has a corresponding index table region? > > Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 > rs's? What happens if we go up to 20, or 100 rs's? If I'm right about the > per index region per table region, I have a feeling this isn't going to > scale well with a large number of regions (since it would potentially have > to talk to each region and essentially every region server). > > Jon. > > On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > >> I mean HBase devs to work on having sec indexing available with the HBase >> distribution... Now I guess many users of HBase implement different kinds >> of sec indexing ..:) >> >> We @Huawei would be happy to provide our support in it as per the interest >> from the community.. :) >> >> -Anoop- >> >> On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote: >> >> > Can you explain, what you mean under 'HBase community version with sec >> > indexing in it'. You will wait, until someone implements the same >> algorithm >> > in trunk hbase, or? >> > >> > >> > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> >> wrote: >> > >> > > No this is not open sourced yet.. As per the interest from the HBase >> > > community we can think of contributing.. >> > > It is time to see HBase community version with sec indexing in it >> (IMHO) >> > > :) >> > > >> > > -Anoop- >> > > >> > > ________________________________________ >> > > From: Andrey Stepachev [[EMAIL PROTECTED]] >> > > Sent: Wednesday, December 05, 2012 5:22 PM >> > > To: [EMAIL PROTECTED] >> > > Subject: Re: HBase - Secondary Index >> > > >> > > Hi. >> > > >> > > Indexing solution looks tempting. >> > > Are there any plans to open source your solution (or it already open >> and >> > I >> > > can't find it?). >> > > >> > > >> > > >> > > >> > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> +
Andrew Purtell 2012-12-06, 04:41
-
RE: HBase - Secondary IndexAnoop Sam John 2012-12-06, 04:57
>I'd guess that would look like a coprocessor based solution, so
there's no cost to those who don't want to use it, and minimal changes to core code. Yes Andrew, This was the main consideration from our side when we were doing the design of the secondary indexing solution here. Plus the minimum possible degradation for the Put as our customer was very much on to it :) >Everyone asks for it. All the time. Yes this is what we also been hearing from our customers. :) -Anoop- ________________________________________ From: Andrew Purtell [[EMAIL PROTECTED]] Sent: Thursday, December 06, 2012 10:11 AM To: [EMAIL PROTECTED] Subject: Re: HBase - Secondary Index We won't find a secondary indexing scheme for HBase that will fit all use cases. However I am becoming convinced we should distribute as part of the distribution something good enough for some subset of simple/common use cases, so users have *something* to try, and can play around it. Everyone asks for it. All the time. I'd guess that would look like a coprocessor based solution, so there's no cost to those who don't want to use it, and minimal changes to core code. Maybe it would suit them, or if not at least they will have enough experience having played around with it to decide why they might need to do something else and what that might look like for their use case. On 12/6/12, lars hofhansl <[EMAIL PROTECTED]> wrote: > I'd say "it depends". > It seems everybody wants secondary indexes in HBase. The problem is that > most folks don't agree what that actually means. > > The most interesting problem and discussion point (IMHO) is that HBase would > need some form of schema description. > > See also HBASE-7221. Assuming we had something like that, it could be a > building block for a simple built-in secondary index solution. > > In the end you're probably right and we cannot prescribe a single secondary > index solution that will fit all use cases. > > -- Lars > > > ----- Original Message ----- > From: Jonathan Hsieh <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Wednesday, December 5, 2012 11:23 AM > Subject: Re: HBase - Secondary Index > > I personally feel that we should only add the primitive apis necessary to > make this work (and if non are required, all the better!), and to try to > keep secondary indexing work it as a separate project on top of hbase > because there are many possible valid architectures and implementations. > > I have two question areas -- one touched upon in previous messages -- what > logging and how do we get consistency guarantee's do we get bewteen the > index and primary? > > The other has to do with scalability. I'm not sure I interpreted the slides > correctly, but from the slides 8 and 13, is the architecture such that each > primary table region has a corresponding index table region? > > Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 > rs's? What happens if we go up to 20, or 100 rs's? If I'm right about the > per index region per table region, I have a feeling this isn't going to > scale well with a large number of regions (since it would potentially have > to talk to each region and essentially every region server). > > Jon. > > On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > >> I mean HBase devs to work on having sec indexing available with the HBase >> distribution... Now I guess many users of HBase implement different kinds >> of sec indexing ..:) >> >> We @Huawei would be happy to provide our support in it as per the interest >> from the community.. :) >> >> -Anoop- >> >> On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> wrote: >> >> > Can you explain, what you mean under 'HBase community version with sec >> > indexing in it'. You will wait, until someone implements the same >> algorithm >> > in trunk hbase, or? >> > >> > >> > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> >> wrote: >> > >> > > No this is not open sourced yet.. As per the interest from the HBase +
Anoop Sam John 2012-12-06, 04:57
-
Re: HBase - Secondary Indexramkrishna vasudevan 2012-12-06, 13:05
Yes there is a one to one mapping, but even if 5 indices are there still we
will have only all the indices data in that same region. May be more perf testing with more regions is needed. Anoop, need to tell something on the range scans? Regards Ram On Thu, Dec 6, 2012 at 10:27 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > >I'd guess that would look like a coprocessor based solution, so > there's no cost to those who don't want to use it, and minimal changes > to core code. > > Yes Andrew, This was the main consideration from our side when we were > doing the design of the secondary indexing solution here. Plus the minimum > possible degradation for the Put as our customer was very much on to it :) > > >Everyone asks for it. All the time. > Yes this is what we also been hearing from our customers. :) > > -Anoop- > ________________________________________ > From: Andrew Purtell [[EMAIL PROTECTED]] > Sent: Thursday, December 06, 2012 10:11 AM > To: [EMAIL PROTECTED] > Subject: Re: HBase - Secondary Index > > We won't find a secondary indexing scheme for HBase that will fit all > use cases. > > However I am becoming convinced we should distribute as part of the > distribution something good enough for some subset of simple/common > use cases, so users have *something* to try, and can play around it. > Everyone asks for it. All the time. > > I'd guess that would look like a coprocessor based solution, so > there's no cost to those who don't want to use it, and minimal changes > to core code. > > Maybe it would suit them, or if not at least they will have enough > experience having played around with it to decide why they might need > to do something else and what that might look like for their use case. > > On 12/6/12, lars hofhansl <[EMAIL PROTECTED]> wrote: > > I'd say "it depends". > > It seems everybody wants secondary indexes in HBase. The problem is that > > most folks don't agree what that actually means. > > > > The most interesting problem and discussion point (IMHO) is that HBase > would > > need some form of schema description. > > > > See also HBASE-7221. Assuming we had something like that, it could be a > > building block for a simple built-in secondary index solution. > > > > In the end you're probably right and we cannot prescribe a single > secondary > > index solution that will fit all use cases. > > > > -- Lars > > > > > > ----- Original Message ----- > > From: Jonathan Hsieh <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Cc: > > Sent: Wednesday, December 5, 2012 11:23 AM > > Subject: Re: HBase - Secondary Index > > > > I personally feel that we should only add the primitive apis necessary to > > make this work (and if non are required, all the better!), and to try to > > keep secondary indexing work it as a separate project on top of hbase > > because there are many possible valid architectures and implementations. > > > > I have two question areas -- one touched upon in previous messages -- > what > > logging and how do we get consistency guarantee's do we get bewteen the > > index and primary? > > > > The other has to do with scalability. I'm not sure I interpreted the > slides > > correctly, but from the slides 8 and 13, is the architecture such that > each > > primary table region has a corresponding index table region? > > > > Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 > > rs's? What happens if we go up to 20, or 100 rs's? If I'm right about > the > > per index region per table region, I have a feeling this isn't going to > > scale well with a large number of regions (since it would potentially > have > > to talk to each region and essentially every region server). > > > > Jon. > > > > On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> > wrote: > > > >> I mean HBase devs to work on having sec indexing available with the > HBase > >> distribution... Now I guess many users of HBase implement different > kinds > >> of sec indexing ..:) > > +
ramkrishna vasudevan 2012-12-06, 13:05
-
Re: HBase - Secondary IndexDoug Meil 2012-12-05, 21:58
re: "It seems everybody wants secondary indexes in HBase. The problem is that most folks don't agree what that actually means." Agreed. And often times when people say "secondary indexing" they may be talking about an access path on something that isn't represented in the key. So you'd not just need a "row key schema" (which is what we were talking about in HBASE-7221), but a schema for the columns too. On 12/5/12 4:03 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: >I'd say "it depends". >It seems everybody wants secondary indexes in HBase. The problem is that >most folks don't agree what that actually means. > >The most interesting problem and discussion point (IMHO) is that HBase >would need some form of schema description. > >See also HBASE-7221. Assuming we had something like that, it could be a >building block for a simple built-in secondary index solution. > >In the end you're probably right and we cannot prescribe a single >secondary index solution that will fit all use cases. > >-- Lars > > >----- Original Message ----- >From: Jonathan Hsieh <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Cc: >Sent: Wednesday, December 5, 2012 11:23 AM >Subject: Re: HBase - Secondary Index > >I personally feel that we should only add the primitive apis necessary to >make this work (and if non are required, all the better!), and to try to >keep secondary indexing work it as a separate project on top of hbase >because there are many possible valid architectures and implementations. > >I have two question areas -- one touched upon in previous messages -- what >logging and how do we get consistency guarantee's do we get bewteen the >index and primary? > >The other has to do with scalability. I'm not sure I interpreted the >slides >correctly, but from the slides 8 and 13, is the architecture such that >each >primary table region has a corresponding index table region? > >Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 >rs's? What happens if we go up to 20, or 100 rs's? If I'm right about >the >per index region per table region, I have a feeling this isn't going to >scale well with a large number of regions (since it would potentially have >to talk to each region and essentially every region server). > >Jon. > >On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > >> I mean HBase devs to work on having sec indexing available with the >>HBase >> distribution... Now I guess many users of HBase implement different >>kinds >> of sec indexing ..:) >> >> We @Huawei would be happy to provide our support in it as per the >>interest >> from the community.. :) >> >> -Anoop- >> >> On Wed, Dec 5, 2012 at 9:29 PM, Andrey Stepachev <[EMAIL PROTECTED]> >>wrote: >> >> > Can you explain, what you mean under 'HBase community version with sec >> > indexing in it'. You will wait, until someone implements the same >> algorithm >> > in trunk hbase, or? >> > >> > >> > On Wed, Dec 5, 2012 at 4:55 PM, Anoop Sam John <[EMAIL PROTECTED]> >> wrote: >> > >> > > No this is not open sourced yet.. As per the interest from the >>HBase >> > > community we can think of contributing.. >> > > It is time to see HBase community version with sec indexing in it >> (IMHO) >> > > :) >> > > >> > > -Anoop- >> > > >> > > ________________________________________ >> > > From: Andrey Stepachev [[EMAIL PROTECTED]] >> > > Sent: Wednesday, December 05, 2012 5:22 PM >> > > To: [EMAIL PROTECTED] >> > > Subject: Re: HBase - Secondary Index >> > > >> > > Hi. >> > > >> > > Indexing solution looks tempting. >> > > Are there any plans to open source your solution (or it already open >> and >> > I >> > > can't find it?). >> > > >> > > >> > > >> > > >> > > On Tue, Dec 4, 2012 at 12:10 PM, Anoop Sam John <[EMAIL PROTECTED]> >> > > wrote: >> > > >> > > > Hi All >> > > > >> > > > Last week I got a chance to present the secondary >> indexing >> > > > solution what we have done in Huawei at the China Hadoop >>Conference. >> > You +
Doug Meil 2012-12-05, 21:58
-
Re: HBase - Secondary IndexAnoop John 2012-12-06, 01:17
Hi All
Thanks for the look and your concerns and comments.. @Jon Yes the consistency is maintained now. As the put to both the tables in the same RS this was comparably easy job..More granular details on how this is being done , I can share in later mails. Yes as per Jon when the cluster is really big with 100s of RSs, for the scan it need to visit every region in RSs. But may be from some regions no data with the condition present and the scan will end immediately.. We were also thinking about ways to add some custom blooms so that a special bloom on the index region can tell whether a column value is present in that region or not. Didn't try with this and the perf comparison.. Now we are in the process of profiling and fine tuning.. What we felt is that the per region indexing would be still better than the full table scan.. Also one main reason for not doing the client based approach where a global ordered indexing will be there, is the put performance.. In the past when we have done a fully client based solution ( something like Lily) the put performance was not meeting our goal as one put was potentially making many puts.. In our customer case some tables having upto 5 indices... @Ted, at production level it is yet to get used.. The apps using the indexing is under the way now... Good to hear from all so that we can also get more scenarios and concerns and opinions from experts..:) -Anoop- On Thu, Dec 6, 2012 at 3:28 AM, Doug Meil <[EMAIL PROTECTED]>wrote: > > re: "It seems everybody wants secondary indexes in HBase. The problem is > that most folks don't agree what that actually means." > > Agreed. And often times when people say "secondary indexing" they may be > talking about an access path on something that isn't represented in the > key. So you'd not just need a "row key schema" (which is what we were > talking about in HBASE-7221), but a schema for the columns too. > > > > > > > On 12/5/12 4:03 PM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: > > >I'd say "it depends". > >It seems everybody wants secondary indexes in HBase. The problem is that > >most folks don't agree what that actually means. > > > >The most interesting problem and discussion point (IMHO) is that HBase > >would need some form of schema description. > > > >See also HBASE-7221. Assuming we had something like that, it could be a > >building block for a simple built-in secondary index solution. > > > >In the end you're probably right and we cannot prescribe a single > >secondary index solution that will fit all use cases. > > > >-- Lars > > > > > >----- Original Message ----- > >From: Jonathan Hsieh <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Cc: > >Sent: Wednesday, December 5, 2012 11:23 AM > >Subject: Re: HBase - Secondary Index > > > >I personally feel that we should only add the primitive apis necessary to > >make this work (and if non are required, all the better!), and to try to > >keep secondary indexing work it as a separate project on top of hbase > >because there are many possible valid architectures and implementations. > > > >I have two question areas -- one touched upon in previous messages -- what > >logging and how do we get consistency guarantee's do we get bewteen the > >index and primary? > > > >The other has to do with scalability. I'm not sure I interpreted the > >slides > >correctly, but from the slides 8 and 13, is the architecture such that > >each > >primary table region has a corresponding index table region? > > > >Is slide 14 a comparison of a full table scan vs the indexed lookups on 4 > >rs's? What happens if we go up to 20, or 100 rs's? If I'm right about > >the > >per index region per table region, I have a feeling this isn't going to > >scale well with a large number of regions (since it would potentially have > >to talk to each region and essentially every region server). > > > >Jon. > > > >On Wed, Dec 5, 2012 at 9:54 AM, Anoop John <[EMAIL PROTECTED]> wrote: > > +
Anoop John 2012-12-06, 01:17
-
Re: HBase - Secondary IndexNick Dimiduk 2012-12-18, 17:48
Hi Anoop,
Your presentation has garnered quite a bit of community interest. Have you considered providing your implementation to the community, perhaps in an HBase-contrib module? Thanks, Nick On Tue, Dec 4, 2012 at 12:10 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi All > > Last week I got a chance to present the secondary indexing > solution what we have done in Huawei at the China Hadoop Conference. You > can see the presentation from > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > I would like to hear what others think on this. :) > > > > -Anoop- > +
Nick Dimiduk 2012-12-18, 17:48
-
Re: HBase - Secondary IndexAndrew Purtell 2012-12-19, 00:51
Hi Anoop,
What Nick asked. I've also heard people wonder this out loud in a few places. On Tue, Dec 18, 2012 at 9:48 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > Hi Anoop, > > Your presentation has garnered quite a bit of community interest. Have you > considered providing your implementation to the community, perhaps in an > HBase-contrib module? > > Thanks, > Nick > > On Tue, Dec 4, 2012 at 12:10 AM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > Hi All > > > > Last week I got a chance to present the secondary indexing > > solution what we have done in Huawei at the China Hadoop Conference. You > > can see the presentation from > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > -Anoop- > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2012-12-19, 00:51
-
Re: HBase - Secondary Indexramkrishna vasudevan 2012-12-19, 04:24
Hi Anoop
Its great to see people accepting this design. Hope it comes out to the contrib. Very happy to see positive comments. Regards Ram On Wed, Dec 19, 2012 at 6:21 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Hi Anoop, > > What Nick asked. I've also heard people wonder this out loud in a few > places. > > > On Tue, Dec 18, 2012 at 9:48 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote: > > > Hi Anoop, > > > > Your presentation has garnered quite a bit of community interest. Have > you > > considered providing your implementation to the community, perhaps in an > > HBase-contrib module? > > > > Thanks, > > Nick > > > > On Tue, Dec 4, 2012 at 12:10 AM, Anoop Sam John <[EMAIL PROTECTED]> > > wrote: > > > > > Hi All > > > > > > Last week I got a chance to present the secondary indexing > > > solution what we have done in Huawei at the China Hadoop Conference. > You > > > can see the presentation from > > > http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf > > > > > > > > > > > > I would like to hear what others think on this. :) > > > > > > > > > > > > -Anoop- > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > +
ramkrishna vasudevan 2012-12-19, 04:24
|