Weishung Chung 2011-03-25, 17:38
+1 Thank you David for the great explanation. It's complicated.
I am pretty new to this BigData space and found it really interesting and
always want to learn more about it. I will definitely look into OpenTSDB as
suggested. Thanks again :D
On Fri, Mar 25, 2011 at 12:18 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
> Hmmm.... maybe my mental model is deficient. How do you propose building a
> secondary index without a transaction?
> The reason indexes work is that they store the data in a different way than
> the primary table. That implies a second, independent data storage.
> Without a transaction you can't be guaranteed that the second data
> structure is always updated in sync with the primary table.
> I suppose you could roll the multiple data writes into the initial data
> write -- that would work if you have write-once data. But if you partially
> update the data then you have the issue that you may not have enough
> information in the update to correctly write the key for the secondary data
> stores. This would mean (in general) that you would have to read an entire
> row before you update any part of it so that you can maintain the secondary
> structures. Do you see the performance problem here? (or that you are
> introducing a limited transactional / eventually consistent state into the
> data store)
> There may be optimizations where you could skip that part of the code if
> there were no indexes. But now you are talking about greatly increasing the
> complexity of the codebase for a use case that is somewhat specialized.
> Hence, you see that people who really care about secondary indexes /
> transaction hbase have separate packages. The probably don't do the job as
> well as is ideally possible by rolling the code into hbase proper, but on
> the other hand, neither do they increase the complexity of the main code
> branch (hence they don't slow down the core development work).
> I will stand by my point that there are engineering trade-offs to be made.
> Take the unix philosophy: small components, loosely coupled. If you need
> indexes, build it on top of HBase, not inside of HBase. Using things like
> co-processors allows you to extend the capabilities of HBase in a way that
> does not impact the core product and hurt all of the other users. An example
> of this is OpenTSDB. It is a time-series database that uses hbase under the
> covers, but it doesn't ask that hbase support its needs in some special way.
> It is very instructive to see how it was constructed.
> -----Original Message-----
> From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Friday, March 25, 2011 9:27 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Stargate+hbase
> Thank you so much for the informative info. It really helps me out.
> For secondary index, even without transaction, I would think one could
> build a secondary index on another key especially if we have row level
> locking. Correct me if I am wrong.
> Also, I have read about clustered B-Tree used in InnoDB to implement
> secondary index but I know that B-Tree is the primary limitation when come
> to scalability and the main reason why NoSQL have discarded B-Tree. But it
> would be super nice to be able to build the secondary index without using
> another secondary table in HBase.
> I am not complaining but I would love to see HBase continues to be the top
> NoSQL solution out there :D
> Way to go HBase !
> On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <[EMAIL PROTECTED]>
> > Do you know what it means to make secondary indexing a feature? There
> > two reasonable outcomes:
> > 1) adding ACID semantics (and thus killing scalability)
> > 2) allowing the secondary index to be out of date (leading to every naïve
> > user claiming that there is a serious bug that must be fixed).
> > Secondary indexes are basically another way of storing (part of) the
> > E.g. another table, sorted on the field(s) that you want to search on.