Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # dev - a model for accumulo write scaling performance


+
Aaron Cordova 2012-02-24, 19:35
+
Clint Green 2012-02-24, 20:03
+
Keith Turner 2012-02-24, 20:27
Copy link to this message
-
RE: a model for accumulo write scaling performance
Dave Marion 2012-02-25, 00:41


 

That may be a good metric for your workload on EC2 virtualized hardware at
different scales; could be useful for regression testing different versions
of Hadoop + Accumulo. Certainly workload and hardware differences could end
up with a different model.

 

From: Aaron Cordova [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 24, 2012 2:36 PM
To: [EMAIL PROTECTED]
Subject: a model for accumulo write scaling performance

 

In my experience with Accumulo on EC2, I've seen about an 85% increase in
aggregate write rate each time the size of the cluster is doubled. I've
tried to capture that behavior in a model to help myself understand it.

 

The model I came up with is the following:

 

where

            w: aggregate write rate (writes per second)

            m: number of machines

            k: standalone single server performance (in my experience about
30k writes per second on average)

 

the units of k and w are writes per second

 

for those of you without the ability to see graphics in email, the model is:

            

            w = m * pow(0.85, log(m, 2)) * k

 

First of all, my algebra may be rusty, so it may be possible to simplify the
model ... second, does the model make sense?