Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Vs CitrusLeaf?


Copy link to this message
-
Re: HBase Vs CitrusLeaf?
This is GREAT information folks.  This is why I like open source communities
-:)  I will present this to management, but in the mean time, the management
has thrown another *monkey* wrench.  They want me to check the possibility
of replacing Netezza with *something*.  Of course, I want to propose
replacing Netezza with HBase.  Anyway, it's best if I start another email
thread.  Thanks again.

On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> > While generalizations are dangerous, the one place when C++ code could
> > shine over java (JVM really) is one does not have to fight the GC.
>
> Yes.
>
> > That being said, the folks working on hbase
> > have been actively been addressing this problem to the extent possible
> > in pure java by using unmanaged heap memory. Search for "mslab hbase" to
> > learn more about it.
>
> And Cloudera's Li Pi has been working on using off heap memory as a
> secondary cache in HBASE-4027 and related jiras:
> https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is
> important work. This gets us a lot closer to behaving like a C++-ish "large
> memory" process than we can under a JVM GC regime, until perhaps G1 is
> stable in what people run in production.
>
>
> Best regards,
>
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> >________________________________
> >From: Arvind Jayaprakash <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Sent: Thursday, September 8, 2011 2:49 AM
> >Subject: Re: HBase Vs CitrusLeaf?
> >
> >On Sep 06, Something Something wrote:
> >>Anyway, before I spent a lot of time on it, I thought I should check if
> >>anyone has compared HBase against CitrusLeaf.  If you've, I would greatly
> >>appreciate it if you would share your experiences.
> >
> >Disclaimer: I was an early evaluator/tester of citrusleaf about a year
> >ago when it was in its infancy. Though I am not affliated with them in
> >any manner, I might be more benevolent to them than most readers of this
> >mailing list.
> >
> >The short answer is that hbase & citrusleaf (called CL in remainder of
> >the mail) are very different products.
> >
> >CL cares a lot more about predictable latencies than hbase does. This is
> >manifested in two aspects of the design:
> >
> >* It is heavily optimized for large RAM + SSD usage. While hbase does
> >a fair job of using RAM, I can say for sure that both the throughput and
> >latency trends is much better with CL in cases where spinning disks are
> >not used directly in the readwrite path.
> >
> >* Multiple machines can concurrently/actively handle requests for the
> >same key, so the loss of one server does not mean that a range of keys
> >is temporarily unavailable. A hbase cluster does have a partial,
> >temporary outage when a region server dies. Things don't get back to
> >normal immediately even when a new server takes over since not all
> >region data may now be local disk reads. Even if they are, it won't be
> >readily waiting for you in fast memory.
> >
> >* A third aspect that is more of a side-effect is that HDFS still has a
> >SPOF in form the namenode does continue to be a cause for concern wrt
> >overall uptime guarantees
> >
> >
> >Here is where hbase would do much better:
> >
> >* It is designed for much larger data to the point where it is natural
> >for the entire dataset to much larger than the total available RAM and
> >the usage of hard disks as the primary storage medium is natural.
> >
> >* A bigtable implementation is also designed for both ranged scans and
> >also full table scans. Last I recall, CL was more of a DHT and so ranged
> >scans is infeasible and doing full scans would qualify as much more than
> >shooting oneself in the foot.
> >
> >
> >And here is where hbase has advantages in principle:
> >
> >* As others mentioned, there are "textbook" advantages of using an open
> >source solution.
> >
> >* hbase definitely has run both longer and on larger clusters than CL
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB