Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Stargate+hbase


+
sreejith P. K. 2011-03-24, 13:18
+
Jean-Daniel Cryans 2011-03-24, 16:27
+
sreejith P. K. 2011-03-25, 05:47
+
Stack 2011-03-25, 05:48
+
Wei Shung Chung 2011-03-25, 07:03
+
Buttler, David 2011-03-25, 15:39
Copy link to this message
-
Re: Stargate+hbase
Thank you so much for the informative info. It really helps me out.

For secondary index, even without transaction, I would think one could still
build a secondary index on another key especially if we have row level
locking. Correct me if I am wrong.

Also, I have read about clustered B-Tree used in InnoDB to implement
secondary index but I know that B-Tree is the primary limitation when come
to scalability and the main reason why NoSQL have discarded B-Tree. But it
would be super nice to be able to build the secondary index without using
another secondary table in HBase.

I am not complaining but I would love to see HBase continues to be the top
NoSQL solution out there :D
Way to go HBase !

On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[EMAIL PROTECTED]> wrote:

> Do you know what it means to make secondary indexing a feature?  There are
> two reasonable outcomes:
> 1) adding ACID semantics (and thus killing scalability)
> 2) allowing the secondary index to be out of date (leading to every naïve
> user claiming that there is a serious bug that must be fixed).
>
> Secondary indexes are basically another way of storing (part of) the data.
>  E.g. another table, sorted on the field(s) that you want to search on.  In
> order to ensure consistency between the primary table and the secondary
> table (index), you have to guarantee that when you mutate the primary table
> that the secondary table is mutated in the same atomic transaction.  Since
> HBase only has row-level locks, this can't be guaranteed across tables.
>
> The situation is not hopeless, because in many cases you don't need to have
> perfectly consistent data and can afford to wait for cleanup tasks.  For
> some applications, you can ensure that the index is updated close enough to
> the table update (using external transactions, or something similar) that
> users would never notice.  One way to implement an eventually consistent
> secondary index would be to mimic the way cluster replication is done.
>
> However, what  I have described is difficult to do generically -- and there
> are engineering tradeoffs that need to be made.  If you absolutely need a
> transactional and consistent secondary index, I would suggest using Oracle,
> MySQL, or another relational database, where this was designed in as a
> primary feature.  Just don't complain that they are too slow or don't scale
> as well as HBase.
>
> </rant>
>
> Sorry for the rant.  If you want to have a secondary index here is what you
> need to do:
> Modify your application so that every time you write to the primary table,
> you also write to a secondary table, keyed off of the values you want to
> search on.  If you can't guarantee that the values form a secondary key
> (i.e. are unique across your entire table), you can make your key a compound
> key (see, for example, how "tsuna" designed OpenTSDB) with your primary key
> as a component.
>
> Then, when you need to query, you can do range queries over the secondary
> table to retrieve the keys in the primary table to return the full data row.
>
> Dave
>
> -----Original Message-----
> From: Wei Shung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Friday, March 25, 2011 12:04 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Stargate+hbase
>
> I need to use secondary indexing too, hopefully this important feature
> will be made available soon :)
>
> Sent from my iPhone
>
> On Mar 25, 2011, at 12:48 AM, Stack <[EMAIL PROTECTED]> wrote:
>
> > There is no native support for secondary indices in HBase (currently).
> > You will have to manage it yourself.
> > St.Ack
> >
> > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <[EMAIL PROTECTED]
> > > wrote:
> >> I have tried secondary indexing. It seems I miss some points. Could
> >> you
> >> please explain how it is possible using secondary indexing?
> >>
> >>
> >> I have tried like,
> >>
> >>
> >>                Columnamilty1:kwd1
> >>                Columnamilty1:kwd2
> >> row1         Columnamilty1:kwd3
> >>                Columnamilty1:kwd2
+
Buttler, David 2011-03-25, 17:18
+
Weishung Chung 2011-03-25, 17:38
+
Stack 2011-03-25, 17:09
+
Stack 2011-03-25, 17:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB