Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase Vs CitrusLeaf?

Copy link to this message
Re: HBase Vs CitrusLeaf?
Something Something 2011-09-08, 06:10
This is GREAT information folks.  This is why I like open source communities
-:)  I will present this to management, but in the mean time, the management
has thrown another *monkey* wrench.  They want me to check the possibility
of replacing Netezza with *something*.  Of course, I want to propose
replacing Netezza with HBase.  Anyway, it's best if I start another email
thread.  Thanks again.

On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> > While generalizations are dangerous, the one place when C++ code could
> > shine over java (JVM really) is one does not have to fight the GC.
> Yes.
> > That being said, the folks working on hbase
> > have been actively been addressing this problem to the extent possible
> > in pure java by using unmanaged heap memory. Search for "mslab hbase" to
> > learn more about it.
> And Cloudera's Li Pi has been working on using off heap memory as a
> secondary cache in HBASE-4027 and related jiras:
> https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is
> important work. This gets us a lot closer to behaving like a C++-ish "large
> memory" process than we can under a JVM GC regime, until perhaps G1 is
> stable in what people run in production.
> Best regards,
>    - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> >________________________________
> >From: Arvind Jayaprakash <[EMAIL PROTECTED]>
> >Sent: Thursday, September 8, 2011 2:49 AM
> >Subject: Re: HBase Vs CitrusLeaf?
> >
> >On Sep 06, Something Something wrote:
> >>Anyway, before I spent a lot of time on it, I thought I should check if
> >>anyone has compared HBase against CitrusLeaf.  If you've, I would greatly
> >>appreciate it if you would share your experiences.
> >
> >Disclaimer: I was an early evaluator/tester of citrusleaf about a year
> >ago when it was in its infancy. Though I am not affliated with them in
> >any manner, I might be more benevolent to them than most readers of this
> >mailing list.
> >
> >The short answer is that hbase & citrusleaf (called CL in remainder of
> >the mail) are very different products.
> >
> >CL cares a lot more about predictable latencies than hbase does. This is
> >manifested in two aspects of the design:
> >
> >* It is heavily optimized for large RAM + SSD usage. While hbase does
> >a fair job of using RAM, I can say for sure that both the throughput and
> >latency trends is much better with CL in cases where spinning disks are
> >not used directly in the readwrite path.
> >
> >* Multiple machines can concurrently/actively handle requests for the
> >same key, so the loss of one server does not mean that a range of keys
> >is temporarily unavailable. A hbase cluster does have a partial,
> >temporary outage when a region server dies. Things don't get back to
> >normal immediately even when a new server takes over since not all
> >region data may now be local disk reads. Even if they are, it won't be
> >readily waiting for you in fast memory.
> >
> >* A third aspect that is more of a side-effect is that HDFS still has a
> >SPOF in form the namenode does continue to be a cause for concern wrt
> >overall uptime guarantees
> >
> >
> >Here is where hbase would do much better:
> >
> >* It is designed for much larger data to the point where it is natural
> >for the entire dataset to much larger than the total available RAM and
> >the usage of hard disks as the primary storage medium is natural.
> >
> >* A bigtable implementation is also designed for both ranged scans and
> >also full table scans. Last I recall, CL was more of a DHT and so ranged
> >scans is infeasible and doing full scans would qualify as much more than
> >shooting oneself in the foot.
> >
> >
> >And here is where hbase has advantages in principle:
> >
> >* As others mentioned, there are "textbook" advantages of using an open
> >source solution.
> >
> >* hbase definitely has run both longer and on larger clusters than CL