Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Stargate+hbase

sreejith P. K. 2011-03-24, 13:18
Jean-Daniel Cryans 2011-03-24, 16:27
sreejith P. K. 2011-03-25, 05:47
Stack 2011-03-25, 05:48
Wei Shung Chung 2011-03-25, 07:03
Buttler, David 2011-03-25, 15:39
Copy link to this message
Re: Stargate+hbase
Thank you so much for the informative info. It really helps me out.

For secondary index, even without transaction, I would think one could still
build a secondary index on another key especially if we have row level
locking. Correct me if I am wrong.

Also, I have read about clustered B-Tree used in InnoDB to implement
secondary index but I know that B-Tree is the primary limitation when come
to scalability and the main reason why NoSQL have discarded B-Tree. But it
would be super nice to be able to build the secondary index without using
another secondary table in HBase.

I am not complaining but I would love to see HBase continues to be the top
NoSQL solution out there :D
Way to go HBase !

On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[EMAIL PROTECTED]> wrote:

> Do you know what it means to make secondary indexing a feature?  There are
> two reasonable outcomes:
> 1) adding ACID semantics (and thus killing scalability)
> 2) allowing the secondary index to be out of date (leading to every naïve
> user claiming that there is a serious bug that must be fixed).
> Secondary indexes are basically another way of storing (part of) the data.
>  E.g. another table, sorted on the field(s) that you want to search on.  In
> order to ensure consistency between the primary table and the secondary
> table (index), you have to guarantee that when you mutate the primary table
> that the secondary table is mutated in the same atomic transaction.  Since
> HBase only has row-level locks, this can't be guaranteed across tables.
> The situation is not hopeless, because in many cases you don't need to have
> perfectly consistent data and can afford to wait for cleanup tasks.  For
> some applications, you can ensure that the index is updated close enough to
> the table update (using external transactions, or something similar) that
> users would never notice.  One way to implement an eventually consistent
> secondary index would be to mimic the way cluster replication is done.
> However, what  I have described is difficult to do generically -- and there
> are engineering tradeoffs that need to be made.  If you absolutely need a
> transactional and consistent secondary index, I would suggest using Oracle,
> MySQL, or another relational database, where this was designed in as a
> primary feature.  Just don't complain that they are too slow or don't scale
> as well as HBase.
> </rant>
> Sorry for the rant.  If you want to have a secondary index here is what you
> need to do:
> Modify your application so that every time you write to the primary table,
> you also write to a secondary table, keyed off of the values you want to
> search on.  If you can't guarantee that the values form a secondary key
> (i.e. are unique across your entire table), you can make your key a compound
> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key
> as a component.
> Then, when you need to query, you can do range queries over the secondary
> table to retrieve the keys in the primary table to return the full data row.
> Dave
> -----Original Message-----
> From: Wei Shung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Friday, March 25, 2011 12:04 AM
> Subject: Re: Stargate+hbase
> I need to use secondary indexing too, hopefully this important feature
> will be made available soon :)
> Sent from my iPhone
> On Mar 25, 2011, at 12:48 AM, Stack <[EMAIL PROTECTED]> wrote:
> > There is no native support for secondary indices in HBase (currently).
> > You will have to manage it yourself.
> > St.Ack
> >
> > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[EMAIL PROTECTED]
> > > wrote:
> >> I have tried secondary indexing. It seems I miss some points. Could
> >> you
> >> please explain how it is possible using secondary indexing?
> >>
> >>
> >> I have tried like,
> >>
> >>
> >>                Columnamilty1:kwd1
> >>                Columnamilty1:kwd2
> >> row1         Columnamilty1:kwd3
> >>                Columnamilty1:kwd2
Buttler, David 2011-03-25, 17:18
Weishung Chung 2011-03-25, 17:38
Stack 2011-03-25, 17:09
Stack 2011-03-25, 17:10