Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Stargate+hbase


Copy link to this message
-
Re: Stargate+hbase
+1 Thank you David for the great explanation. It's complicated.
I am pretty new to this BigData space and found it really interesting and
always want to learn more about it.  I will definitely look into OpenTSDB as
suggested. Thanks again :D

On Fri, Mar 25, 2011 at 12:18 PM, Buttler, David <[EMAIL PROTECTED]> wrote:

> Hmmm.... maybe my mental model is deficient.  How do you propose building a
> secondary index without a transaction?
>
> The reason indexes work is that they store the data in a different way than
> the primary table.  That implies a second, independent data storage.
>  Without a transaction you can't be guaranteed that the second data
> structure is always updated in sync with the primary table.
>
> I suppose you could roll the multiple data writes into the initial data
> write -- that would work if you have write-once data.  But if you partially
> update the data then you have the issue that you may not have enough
> information in the update to correctly write the key for the secondary data
> stores.  This would mean (in general) that you would have to read an entire
> row before you update any part of it so that you can maintain the secondary
> structures.  Do you see the performance problem here? (or that you are
> introducing a limited transactional / eventually consistent state into the
> data store)
>
> There may be optimizations where you could skip that part of the code if
> there were no indexes.  But now you are talking about greatly increasing the
> complexity of the codebase for a use case that is somewhat specialized.
>  Hence, you see that people who really care about secondary indexes /
> transaction hbase have separate packages.  The probably don't do the job as
> well as is ideally possible by rolling the code into hbase proper, but on
> the other hand, neither do they increase the complexity of the main code
> branch (hence they don't slow down the core development work).
>
> I will stand by my point that there are engineering trade-offs to be made.
>  Take the unix philosophy: small components, loosely coupled. If you need
> indexes, build it on top of HBase, not inside of HBase.  Using things like
> co-processors allows you to extend the capabilities of HBase in a way that
> does not impact the core product and hurt all of the other users. An example
> of this is OpenTSDB.  It is a time-series database that uses hbase under the
> covers, but it doesn't ask that hbase support its needs in some special way.
>  It is very instructive to see how it was constructed.
>
> Dave
>
>
> -----Original Message-----
> From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Friday, March 25, 2011 9:27 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Stargate+hbase
>
> Thank you so much for the informative info. It really helps me out.
>
> For secondary index, even without transaction, I would think one could
> still
> build a secondary index on another key especially if we have row level
> locking. Correct me if I am wrong.
>
> Also, I have read about clustered B-Tree used in InnoDB to implement
> secondary index but I know that B-Tree is the primary limitation when come
> to scalability and the main reason why NoSQL have discarded B-Tree. But it
> would be super nice to be able to build the secondary index without using
> another secondary table in HBase.
>
> I am not complaining but I would love to see HBase continues to be the top
> NoSQL solution out there :D
> Way to go HBase !
>
> On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[EMAIL PROTECTED]>
> wrote:
>
> > Do you know what it means to make secondary indexing a feature?  There
> are
> > two reasonable outcomes:
> > 1) adding ACID semantics (and thus killing scalability)
> > 2) allowing the secondary index to be out of date (leading to every naïve
> > user claiming that there is a serious bug that must be fixed).
> >
> > Secondary indexes are basically another way of storing (part of) the
> data.
> >  E.g. another table, sorted on the field(s) that you want to search on.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB