|
Something Something
2011-09-07, 05:24
sriram
2011-09-07, 08:51
Stack
2011-09-07, 15:00
Ian Varley
2011-09-07, 15:23
Arvind Jayaprakash
2011-09-07, 18:49
lars hofhansl
2011-09-07, 19:42
Arvind Jayaprakash
2011-09-07, 20:04
Andrew Purtell
2011-09-08, 05:27
Something Something
2011-09-08, 06:10
Jean-Daniel Cryans
2011-09-08, 17:26
Stack
2011-09-08, 17:56
|
-
HBase Vs CitrusLeaf?Something Something 2011-09-07, 05:24
I am a HUGE fan of HBase, but our management team wants us to evaluate
CitrusLeaf (http://citrusleaf.net/index.php). I have NO idea why! Our management claims that CitrusLeaf is (got to be) faster because it's written in C++. Trying to find if there's any truth to that. Anyway, before I spent a lot of time on it, I thought I should check if anyone has compared HBase against CitrusLeaf. If you've, I would greatly appreciate it if you would share your experiences. Please help. Thanks.
-
Re: HBase Vs CitrusLeaf?sriram 2011-09-07, 08:51
Something Something <mailinglists19@...> writes:
> > I am a HUGE fan of HBase, but our management team wants us to evaluate > CitrusLeaf (http://citrusleaf.net/index.php). I have NO idea why! Our > management claims that CitrusLeaf is (got to be) faster because it's written > in C++. Trying to find if there's any truth to that. > > Anyway, before I spent a lot of time on it, I thought I should check if > anyone has compared HBase against CitrusLeaf. If you've, I would greatly > appreciate it if you would share your experiences. > > Please help. Thanks. > Try hypertable too its performance is commendable to hbase. http://www.hypertable.com/pub/perfeval/test1/ Its written in c++.
-
Re: HBase Vs CitrusLeaf?Stack 2011-09-07, 15:00
On Tue, Sep 6, 2011 at 10:24 PM, Something Something
<[EMAIL PROTECTED]> wrote: > I am a HUGE fan of HBase, but our management team wants us to evaluate > CitrusLeaf (http://citrusleaf.net/index.php). I have NO idea why! Their website features lucelle ball! > Our > management claims that CitrusLeaf is (got to be) faster because it's written > in C++. Trying to find if there's any truth to that. > If there were no managers, we'd have no work. St.Ack
-
Re: HBase Vs CitrusLeaf?Ian Varley 2011-09-07, 15:23
Well said, Stack. :) Maybe HBase needs more celebrity endorsements? ;)
Another important point you should mention to your manager is that (as far as I can see) CitrusLeaf is a closed-source, proprietary product. While there's no harm in this, it does introduce a dependency on Citrusleaf to fix issues. By contrast, in a fully open-source product like HBase, you have complete control over your destiny; you can fix bugs, branch and change the software, and get community help (for free) if things don't seem to be working correctly (and, there are services companies like Cloudera and HortonWorks who provide paid support as well). That, and the fact that HBase is already being proven in production environments at scale (Facebook<http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html>, for example) should make for good arguments; I don't see much on the CitrusLeaf website about other companies using them in big production deployments. On the speed issue, note that while it's traditionally been possible to attain more raw speed in a C++ application, the biggest speed gains usually come from algorithmic advances, not low-level optimization. As such, the fact that HBase is written in java means that it's easier to refactor and bring new approaches to the same problems than it would be in a C++ application (and, as a bonus, the pool of capable developers to contribute is also much bigger). For most applications, squeezing the last ounces of performance out of the code is less important than being able to refactor and improve rapidly over time. If your management still demands that C++ must be better, press them to come up with real throughput and latency requirements, and see if HBase (or Citrusleaf, or Hypertable, any other product) can meet them. The Yahoo Cloud Serving Benchmark tool is a good way to run benchmarks like this. Ian On Sep 7, 2011, at 10:00 AM, Stack wrote: On Tue, Sep 6, 2011 at 10:24 PM, Something Something <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I am a HUGE fan of HBase, but our management team wants us to evaluate CitrusLeaf (http://citrusleaf.net/index.php). I have NO idea why! Their website features lucelle ball! Our management claims that CitrusLeaf is (got to be) faster because it's written in C++. Trying to find if there's any truth to that. If there were no managers, we'd have no work. St.Ack
-
Re: HBase Vs CitrusLeaf?Arvind Jayaprakash 2011-09-07, 18:49
On Sep 06, Something Something wrote:
>Anyway, before I spent a lot of time on it, I thought I should check if >anyone has compared HBase against CitrusLeaf. If you've, I would greatly >appreciate it if you would share your experiences. Disclaimer: I was an early evaluator/tester of citrusleaf about a year ago when it was in its infancy. Though I am not affliated with them in any manner, I might be more benevolent to them than most readers of this mailing list. The short answer is that hbase & citrusleaf (called CL in remainder of the mail) are very different products. CL cares a lot more about predictable latencies than hbase does. This is manifested in two aspects of the design: * It is heavily optimized for large RAM + SSD usage. While hbase does a fair job of using RAM, I can say for sure that both the throughput and latency trends is much better with CL in cases where spinning disks are not used directly in the readwrite path. * Multiple machines can concurrently/actively handle requests for the same key, so the loss of one server does not mean that a range of keys is temporarily unavailable. A hbase cluster does have a partial, temporary outage when a region server dies. Things don't get back to normal immediately even when a new server takes over since not all region data may now be local disk reads. Even if they are, it won't be readily waiting for you in fast memory. * A third aspect that is more of a side-effect is that HDFS still has a SPOF in form the namenode does continue to be a cause for concern wrt overall uptime guarantees Here is where hbase would do much better: * It is designed for much larger data to the point where it is natural for the entire dataset to much larger than the total available RAM and the usage of hard disks as the primary storage medium is natural. * A bigtable implementation is also designed for both ranged scans and also full table scans. Last I recall, CL was more of a DHT and so ranged scans is infeasible and doing full scans would qualify as much more than shooting oneself in the foot. And here is where hbase has advantages in principle: * As others mentioned, there are "textbook" advantages of using an open source solution. * hbase definitely has run both longer and on larger clusters than CL possibly has. While generalizations are dangerous, the one place when C++ code could shine over java (JVM really) is one does not have to fight the GC. I'd personally be more confomtable with handing off say 48GB of memory to a good C/C++ code than the JVM. That being said, the folks working on hbase have been actively been addressing this problem to the extent possible in pure java by using unmanaged heap memory. Search for "mslab hbase" to learn more about it. My conclusion is that the two products address different problem spaces. So I'd urge you to spend time understanding your access patterns and see which one does it map to more closely. Feel free to contact me off list if you feel the need to ask anything that is not approrpiate for the mailing list but is relevant to this discussion.
-
Re: HBase Vs CitrusLeaf?lars hofhansl 2011-09-07, 19:42
Hi Arvind, This is interesting: > * Multiple machines can concurrently/actively handle requests for the > same key, so the loss of one server does not mean that a range of keys > is temporarily unavailable. A hbase cluster does have a partial, > temporary outage when a region server dies. Things don't get back to > normal immediately even when a new server takes over since not all > region data may now be local disk reads. Even if they are, it won't be > readily waiting for you in fast memory. How does it deal with the write path? If multiple machines can serve reads for the same set of values you either need (1) to have them synchronized (some 2pc/paxus-like consensus) or (2) read from multiple machines to get consensus or (3) synchronously write to multiple machines or (3) accept temporary inconsistencies or (4) something else? -- Lars
-
Re: HBase Vs CitrusLeaf?Arvind Jayaprakash 2011-09-07, 20:04
On Sep 07, lars hofhansl wrote:
>Hi Arvind, > >This is interesting: > >> * Multiple machines can concurrently/actively handle requests for the >> same key, so the loss of one server does not mean that a range of keys >> is temporarily unavailable. A hbase cluster does have a partial, >> temporary outage when a region server dies. Things don't get back to >> normal immediately even when a new server takes over since not all >> region data may now be local disk reads. Even if they are, it won't be >> readily waiting for you in fast memory. > >How does it deal with the write path? > >If multiple machines can serve reads for the same set of values you either need >(1) to have them synchronized (some 2pc/paxus-like consensus) or >(2) read from multiple machines to get consensus or >(3) synchronously write to multiple machines or A write call does not return the status until all replicas (configurable replication factor) have processed the write. It also looks like there is a write master of sorts involved for a given key/range. More on this here: http://www.citrusleaf.net/_docs/Architecture_Overview.pdf This is about as much detail as I go into since I don't have an accurate recollection of more details and even if I did, I might not be in a position to talk about what is not in public domain. >(4) accept temporary inconsistencies
-
Re: HBase Vs CitrusLeaf?Andrew Purtell 2011-09-08, 05:27
> While generalizations are dangerous, the one place when C++ code could
> shine over java (JVM really) is one does not have to fight the GC. Yes. > That being said, the folks working on hbase > have been actively been addressing this problem to the extent possible > in pure java by using unmanaged heap memory. Search for "mslab hbase" to > learn more about it. And Cloudera's Li Pi has been working on using off heap memory as a secondary cache in HBASE-4027 and related jiras: https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is important work. This gets us a lot closer to behaving like a C++-ish "large memory" process than we can under a JVM GC regime, until perhaps G1 is stable in what people run in production. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Arvind Jayaprakash <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, September 8, 2011 2:49 AM >Subject: Re: HBase Vs CitrusLeaf? > >On Sep 06, Something Something wrote: >>Anyway, before I spent a lot of time on it, I thought I should check if >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly >>appreciate it if you would share your experiences. > >Disclaimer: I was an early evaluator/tester of citrusleaf about a year >ago when it was in its infancy. Though I am not affliated with them in >any manner, I might be more benevolent to them than most readers of this >mailing list. > >The short answer is that hbase & citrusleaf (called CL in remainder of >the mail) are very different products. > >CL cares a lot more about predictable latencies than hbase does. This is >manifested in two aspects of the design: > >* It is heavily optimized for large RAM + SSD usage. While hbase does >a fair job of using RAM, I can say for sure that both the throughput and >latency trends is much better with CL in cases where spinning disks are >not used directly in the readwrite path. > >* Multiple machines can concurrently/actively handle requests for the >same key, so the loss of one server does not mean that a range of keys >is temporarily unavailable. A hbase cluster does have a partial, >temporary outage when a region server dies. Things don't get back to >normal immediately even when a new server takes over since not all >region data may now be local disk reads. Even if they are, it won't be >readily waiting for you in fast memory. > >* A third aspect that is more of a side-effect is that HDFS still has a >SPOF in form the namenode does continue to be a cause for concern wrt >overall uptime guarantees > > >Here is where hbase would do much better: > >* It is designed for much larger data to the point where it is natural >for the entire dataset to much larger than the total available RAM and >the usage of hard disks as the primary storage medium is natural. > >* A bigtable implementation is also designed for both ranged scans and >also full table scans. Last I recall, CL was more of a DHT and so ranged >scans is infeasible and doing full scans would qualify as much more than >shooting oneself in the foot. > > >And here is where hbase has advantages in principle: > >* As others mentioned, there are "textbook" advantages of using an open >source solution. > >* hbase definitely has run both longer and on larger clusters than CL >possibly has. > > >While generalizations are dangerous, the one place when C++ code could >shine over java (JVM really) is one does not have to fight the GC. I'd >personally be more confomtable with handing off say 48GB of memory to a >good C/C++ code than the JVM. That being said, the folks working on hbase >have been actively been addressing this problem to the extent possible >in pure java by using unmanaged heap memory. Search for "mslab hbase" to >learn more about it. > > >My conclusion is that the two products address different problem spaces. >So I'd urge you to spend time understanding your access patterns and see
-
Re: HBase Vs CitrusLeaf?Something Something 2011-09-08, 06:10
This is GREAT information folks. This is why I like open source communities
-:) I will present this to management, but in the mean time, the management has thrown another *monkey* wrench. They want me to check the possibility of replacing Netezza with *something*. Of course, I want to propose replacing Netezza with HBase. Anyway, it's best if I start another email thread. Thanks again. On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > While generalizations are dangerous, the one place when C++ code could > > shine over java (JVM really) is one does not have to fight the GC. > > Yes. > > > That being said, the folks working on hbase > > have been actively been addressing this problem to the extent possible > > in pure java by using unmanaged heap memory. Search for "mslab hbase" to > > learn more about it. > > And Cloudera's Li Pi has been working on using off heap memory as a > secondary cache in HBASE-4027 and related jiras: > https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is > important work. This gets us a lot closer to behaving like a C++-ish "large > memory" process than we can under a JVM GC regime, until perhaps G1 is > stable in what people run in production. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > >________________________________ > >From: Arvind Jayaprakash <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Sent: Thursday, September 8, 2011 2:49 AM > >Subject: Re: HBase Vs CitrusLeaf? > > > >On Sep 06, Something Something wrote: > >>Anyway, before I spent a lot of time on it, I thought I should check if > >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly > >>appreciate it if you would share your experiences. > > > >Disclaimer: I was an early evaluator/tester of citrusleaf about a year > >ago when it was in its infancy. Though I am not affliated with them in > >any manner, I might be more benevolent to them than most readers of this > >mailing list. > > > >The short answer is that hbase & citrusleaf (called CL in remainder of > >the mail) are very different products. > > > >CL cares a lot more about predictable latencies than hbase does. This is > >manifested in two aspects of the design: > > > >* It is heavily optimized for large RAM + SSD usage. While hbase does > >a fair job of using RAM, I can say for sure that both the throughput and > >latency trends is much better with CL in cases where spinning disks are > >not used directly in the readwrite path. > > > >* Multiple machines can concurrently/actively handle requests for the > >same key, so the loss of one server does not mean that a range of keys > >is temporarily unavailable. A hbase cluster does have a partial, > >temporary outage when a region server dies. Things don't get back to > >normal immediately even when a new server takes over since not all > >region data may now be local disk reads. Even if they are, it won't be > >readily waiting for you in fast memory. > > > >* A third aspect that is more of a side-effect is that HDFS still has a > >SPOF in form the namenode does continue to be a cause for concern wrt > >overall uptime guarantees > > > > > >Here is where hbase would do much better: > > > >* It is designed for much larger data to the point where it is natural > >for the entire dataset to much larger than the total available RAM and > >the usage of hard disks as the primary storage medium is natural. > > > >* A bigtable implementation is also designed for both ranged scans and > >also full table scans. Last I recall, CL was more of a DHT and so ranged > >scans is infeasible and doing full scans would qualify as much more than > >shooting oneself in the foot. > > > > > >And here is where hbase has advantages in principle: > > > >* As others mentioned, there are "textbook" advantages of using an open > >source solution. > > > >* hbase definitely has run both longer and on larger clusters than CL
-
Re: HBase Vs CitrusLeaf?Jean-Daniel Cryans 2011-09-08, 17:26
Your company sounds lovely.
J-D On Wed, Sep 7, 2011 at 11:10 PM, Something Something <[EMAIL PROTECTED]> wrote: > This is GREAT information folks. This is why I like open source communities > -:) I will present this to management, but in the mean time, the management > has thrown another *monkey* wrench. They want me to check the possibility > of replacing Netezza with *something*. Of course, I want to propose > replacing Netezza with HBase. Anyway, it's best if I start another email > thread. Thanks again. > > On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > >> > While generalizations are dangerous, the one place when C++ code could >> > shine over java (JVM really) is one does not have to fight the GC. >> >> Yes. >> >> > That being said, the folks working on hbase >> > have been actively been addressing this problem to the extent possible >> > in pure java by using unmanaged heap memory. Search for "mslab hbase" to >> > learn more about it. >> >> And Cloudera's Li Pi has been working on using off heap memory as a >> secondary cache in HBASE-4027 and related jiras: >> https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is >> important work. This gets us a lot closer to behaving like a C++-ish "large >> memory" process than we can under a JVM GC regime, until perhaps G1 is >> stable in what people run in production. >> >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >> >________________________________ >> >From: Arvind Jayaprakash <[EMAIL PROTECTED]> >> >To: [EMAIL PROTECTED] >> >Sent: Thursday, September 8, 2011 2:49 AM >> >Subject: Re: HBase Vs CitrusLeaf? >> > >> >On Sep 06, Something Something wrote: >> >>Anyway, before I spent a lot of time on it, I thought I should check if >> >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly >> >>appreciate it if you would share your experiences. >> > >> >Disclaimer: I was an early evaluator/tester of citrusleaf about a year >> >ago when it was in its infancy. Though I am not affliated with them in >> >any manner, I might be more benevolent to them than most readers of this >> >mailing list. >> > >> >The short answer is that hbase & citrusleaf (called CL in remainder of >> >the mail) are very different products. >> > >> >CL cares a lot more about predictable latencies than hbase does. This is >> >manifested in two aspects of the design: >> > >> >* It is heavily optimized for large RAM + SSD usage. While hbase does >> >a fair job of using RAM, I can say for sure that both the throughput and >> >latency trends is much better with CL in cases where spinning disks are >> >not used directly in the readwrite path. >> > >> >* Multiple machines can concurrently/actively handle requests for the >> >same key, so the loss of one server does not mean that a range of keys >> >is temporarily unavailable. A hbase cluster does have a partial, >> >temporary outage when a region server dies. Things don't get back to >> >normal immediately even when a new server takes over since not all >> >region data may now be local disk reads. Even if they are, it won't be >> >readily waiting for you in fast memory. >> > >> >* A third aspect that is more of a side-effect is that HDFS still has a >> >SPOF in form the namenode does continue to be a cause for concern wrt >> >overall uptime guarantees >> > >> > >> >Here is where hbase would do much better: >> > >> >* It is designed for much larger data to the point where it is natural >> >for the entire dataset to much larger than the total available RAM and >> >the usage of hard disks as the primary storage medium is natural. >> > >> >* A bigtable implementation is also designed for both ranged scans and >> >also full table scans. Last I recall, CL was more of a DHT and so ranged >> >scans is infeasible and doing full scans would qualify as much more than >> >shooting oneself in the foot.
-
Re: HBase Vs CitrusLeaf?Stack 2011-09-08, 17:56
On Wed, Sep 7, 2011 at 11:10 PM, Something Something
<[EMAIL PROTECTED]> wrote: > They want me to check the possibility of replacing Netezza with *something*. They want to replace Netezza with you? Given you name, you could replace two Netezza instances. St.Ack |