Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - RE: write throughput in cassandra, understanding hbase


+
S Ahmed 2013-01-22, 18:38
+
lars hofhansl 2013-01-22, 19:03
+
Kevin Odell 2013-01-22, 19:06
Copy link to this message
-
Re: write throughput in cassandra, understanding hbase
S Ahmed 2013-01-22, 19:12
Thanks, I think Lars's comment hints to what might be one reason.

I don't have a cluster setup to test, I'm really an enthusiast (I'm
currently going through the codebase and trying to get a low level feel for
what's going on) and want to know what the possible technical reason is
(both cassandra and hbase are designed differently, so was curious what
could be at the root of the issue).

I'm not here to start a flame war or anything so please don't take it that
way.

>>Where do you see that HBase is doing only 2-3k writes/s?
I must have mis-read it or that was from another benchmark.

What I was thinking is that designs have tradeoffs, and possible
cassandra's design was built where write throughput was more important, at
the cost of x, while hbase's design was more suited for y (which maybe
range scans is?).....
On Tue, Jan 22, 2013 at 2:06 PM, Kevin O'dell <[EMAIL PROTECTED]>wrote:

> Hi S Ahmed,
>
>   How are you today?  I wanted to echo what Lars said Most of these tests
> have an agenda.  With that being said, have you done an of your own
> internal testing?  If so do you have configs, row keys, or results that you
> can share with us so that we can help you tune your cluster for success?
>
> On Tue, Jan 22, 2013 at 2:03 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Where do you see that HBase is doing only 2-3k writes/s?
> > How was the data distributed? Was the table split?
> > Cassandra uses a random partitioner by default, which will nicely
> > distribute the data over the cluster but won't allow to perform range
> scans
> > over your data.
> > HBase always partitions by key ranges, so that the keys can the range
> > scanned. If that is not done correctly and you create monotonically
> > increasing keys, you'll hotspot a single region server.
> >
> > Even then, you can do more than this on single RegionServer.
> >
> > Also note that many of the benchmarks have agendas and cherry pick the
> > results.
> > They probably "forgot" to disabled Nagle's and to distribute the table
> > correctly.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: S Ahmed <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, January 22, 2013 10:38 AM
> > Subject: RE: write throughput in cassandra, understanding hbase
> >
> > I've read articles online where I see cassandra doing like 20K writers
> per
> > second, and hbase around 2-3K.
> >
> > I understand both systems have their strenghts, but I am curious as to
> what
> > is holding hbase from reaching similiar results?
> >
> > Is it HDFS that is the issue?  Or hbase does certain things (to its
> > advantage) that slows the write path down?
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>
+
Ian Varley 2013-01-22, 19:23
+
Andrew Purtell 2013-01-22, 19:32
+
Jean-Daniel Cryans 2013-01-22, 18:46
+
S Ahmed 2013-01-22, 19:01
+
Ted Yu 2013-01-22, 19:05
+
Asaf Mesika 2013-01-22, 20:57
+
anil gupta 2013-01-23, 07:08