


a model for accumulo write scaling performance
Aaron Cordova 20120224, 19:35
In my experience with Accumulo on EC2, I've seen about an 85% increase in aggregate write rate each time the size of the cluster is doubled. I've tried to capture that behavior in a model to help myself understand it.
The model I came up with is the following: where w: aggregate write rate (writes per second) m: number of machines k: standalone single server performance (in my experience about 30k writes per second on average)
the units of k and w are writes per second
for those of you without the ability to see graphics in email, the model is: w = m * pow(0.85, log(m, 2)) * k
First of all, my algebra may be rusty, so it may be possible to simplify the model ... second, does the model make sense?

Re: a model for accumulo write scaling performance
Clint Green 20120224, 20:03
What are the instances you are using for this?
Are you seeing bottlenecks in the network on this scaleout?
How many nodes have you used to demonstrate this behavior?
On Fri, Feb 24, 2012 at 2:35 PM, Aaron Cordova <[EMAIL PROTECTED]> wrote:
> In my experience with Accumulo on EC2, I've seen about an 85% increase in > aggregate write rate each time the size of the cluster is doubled. I've > tried to capture that behavior in a model to help myself understand it. > > The model I came up with is the following: > > where > w: aggregate write rate (writes per second) > m: number of machines > k: standalone single server performance (in my experience about 30k writes > per second on average) > > the units of k and w are writes per second > > for those of you without the ability to see graphics in email, the model > is: > w = m * pow(0.85, log(m, 2)) * k > > First of all, my algebra may be rusty, so it may be possible to simplify > the model ... second, does the model make sense? >

Re: a model for accumulo write scaling performance
Keith Turner 20120224, 20:27
What are the characteristics of the data you are writing? Does each client generate data that spreads across the cluster?
What version of Accumulo are you using? 1.5 has two walog improvements that should help as a cluster grows. It has group commit and writes to logs in parallel. In 1.4 when a batch of data comes in from a client, the walog is locked and then that data is written to the two logs serially.
On Fri, Feb 24, 2012 at 2:35 PM, Aaron Cordova <[EMAIL PROTECTED]> wrote:
> In my experience with Accumulo on EC2, I've seen about an 85% increase in > aggregate write rate each time the size of the cluster is doubled. I've > tried to capture that behavior in a model to help myself understand it. > > The model I came up with is the following: > > where > w: aggregate write rate (writes per second) > m: number of machines > k: standalone single server performance (in my experience about 30k writes > per second on average) > > the units of k and w are writes per second > > for those of you without the ability to see graphics in email, the model > is: > w = m * pow(0.85, log(m, 2)) * k > > First of all, my algebra may be rusty, so it may be possible to simplify > the model ... second, does the model make sense? >

RE: a model for accumulo write scaling performance
Dave Marion 20120225, 00:41
That may be a good metric for your workload on EC2 virtualized hardware at different scales; could be useful for regression testing different versions of Hadoop + Accumulo. Certainly workload and hardware differences could end up with a different model.
From: Aaron Cordova [mailto:[EMAIL PROTECTED]] Sent: Friday, February 24, 2012 2:36 PM To: [EMAIL PROTECTED] Subject: a model for accumulo write scaling performance
In my experience with Accumulo on EC2, I've seen about an 85% increase in aggregate write rate each time the size of the cluster is doubled. I've tried to capture that behavior in a model to help myself understand it.
The model I came up with is the following:
where
w: aggregate write rate (writes per second)
m: number of machines
k: standalone single server performance (in my experience about 30k writes per second on average)
the units of k and w are writes per second
for those of you without the ability to see graphics in email, the model is:
w = m * pow(0.85, log(m, 2)) * k
First of all, my algebra may be rusty, so it may be possible to simplify the model ... second, does the model make sense?

