Thank you so much for the informative info. It really helps me out.
For secondary index, even without transaction, I would think one could still
build a secondary index on another key especially if we have row level
locking. Correct me if I am wrong.
Also, I have read about clustered B-Tree used in InnoDB to implement
secondary index but I know that B-Tree is the primary limitation when come
to scalability and the main reason why NoSQL have discarded B-Tree. But it
would be super nice to be able to build the secondary index without using
another secondary table in HBase.
I am not complaining but I would love to see HBase continues to be the top
NoSQL solution out there :D
Way to go HBase !
On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[EMAIL PROTECTED]> wrote:
> Do you know what it means to make secondary indexing a feature? There are
> two reasonable outcomes:
> 1) adding ACID semantics (and thus killing scalability)
> 2) allowing the secondary index to be out of date (leading to every naïve
> user claiming that there is a serious bug that must be fixed).
> Secondary indexes are basically another way of storing (part of) the data.
> E.g. another table, sorted on the field(s) that you want to search on. In
> order to ensure consistency between the primary table and the secondary
> table (index), you have to guarantee that when you mutate the primary table
> that the secondary table is mutated in the same atomic transaction. Since
> HBase only has row-level locks, this can't be guaranteed across tables.
> The situation is not hopeless, because in many cases you don't need to have
> perfectly consistent data and can afford to wait for cleanup tasks. For
> some applications, you can ensure that the index is updated close enough to
> the table update (using external transactions, or something similar) that
> users would never notice. One way to implement an eventually consistent
> secondary index would be to mimic the way cluster replication is done.
> However, what I have described is difficult to do generically -- and there
> are engineering tradeoffs that need to be made. If you absolutely need a
> transactional and consistent secondary index, I would suggest using Oracle,
> MySQL, or another relational database, where this was designed in as a
> primary feature. Just don't complain that they are too slow or don't scale
> as well as HBase.
> Sorry for the rant. If you want to have a secondary index here is what you
> need to do:
> Modify your application so that every time you write to the primary table,
> you also write to a secondary table, keyed off of the values you want to
> search on. If you can't guarantee that the values form a secondary key
> (i.e. are unique across your entire table), you can make your key a compound
> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key
> as a component.
> Then, when you need to query, you can do range queries over the secondary
> table to retrieve the keys in the primary table to return the full data row.
> -----Original Message-----
> From: Wei Shung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Friday, March 25, 2011 12:04 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Stargate+hbase
> I need to use secondary indexing too, hopefully this important feature
> will be made available soon :)
> Sent from my iPhone
> On Mar 25, 2011, at 12:48 AM, Stack <[EMAIL PROTECTED]> wrote:
> > There is no native support for secondary indices in HBase (currently).
> > You will have to manage it yourself.
> > St.Ack
> > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[EMAIL PROTECTED]
> > > wrote:
> >> I have tried secondary indexing. It seems I miss some points. Could
> >> you
> >> please explain how it is possible using secondary indexing?
> >> I have tried like,
> >> Columnamilty1:kwd1
> >> Columnamilty1:kwd2
> >> row1 Columnamilty1:kwd3
> >> Columnamilty1:kwd2