I was wondering if anyone had a benchmark test for Accumulo? I could write a map/reduce job that creates a bunch of tables, maybe push some data, then drop them but I was wondering if anyone had something better?
It depends on what kind of performance you are trying to test. We have a suite called the "continuous ingest" test which basically pushes data into a table until you tell it to stop. We have used this in the past to get performance characteristics of write throughput between versions.
The numbers will vary wildly based on the size/hardware of your cluster, so we generally do not publish the numbers because they are meaningless without broader context.
I'm not sure that we have a great "read" benchmark, but spinning up a MapReduce job is certainly an easy way to get started.
The benchmark in the D4M paper is very helpful but perhaps you could clarify a few things:
1. The 4 million entries per second pertains to the main table only or the main table, transpose and degree tables as well? 2. Can you share you accumulo-site.xml settings for the test? In particular the memory map size and compaction ratio settings. On Thu, Mar 6, 2014 at 3:07 PM, Jeremy Kepner <[EMAIL PROTECTED]> wrote:
What is the goal of your benchmarking? To some extent, benchmarking Accumulo can't provide any true answers because it won't be using your real-world data. A lot depends on the schema that you use. The D4M benchmark would only be applicable to you if you plan to use their schema. On Sun, Mar 9, 2014 at 2:23 PM, Kepner, Jeremy - 0553 - MITLL < [EMAIL PROTECTED]> wrote:
Definitely a good idea, Jeremy. Performance numbers always benefit the community -- I'd love to make sure they get published prominently on the Accumulo site.
While the value of a benchmark is really only in the workload it performs, a good benchmark can be decomposed into a base set of operations which should be generally applicable. I don't agree that benchmarking Accumulo with D4M is only valid if you then use D4M.
As long as you state your performance requirements in a way that's comparable to your benchmark, that's all that really matters.
On 3/9/14, 4:35 PM, Jeremy Kepner wrote:
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext