Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Accumulo Continuous Testing - Verification MR Job Performance


Copy link to this message
-
Re: Accumulo Continuous Testing - Verification MR Job Performance
The verification job is very sensitive to the number of rounds it
takes to shuffle/sort the results.  How many reducers have you used,
and how much memory have you given them?  More is better.

I think we've clocked the verification job for 24 hours of ingest in
under 2 hours.  This is from memory, so I could be wrong.  But with a
bad configuration (uses only a few small reducers), it can take a very
long time.

Go with as many as 100 reducers per node and let the reducers have a
lot of memory.  You want each reducer to run long enough to make the
process creation overhead small.  So they should run for a few
minutes, each.

Please post back with any improvements!

We are about to enter a testing cycle, so I'll update the example
configuration files with some better instructions.

I'm curious, how many key/value entries did you ingest in 24 hours?

-Eric

On Tue, Oct 22, 2013 at 4:56 PM, Billie Rinaldi
<[EMAIL PROTECTED]> wrote:
> I believe it does take a long time to verify.  Shorter than, but a similar
> order of magnitude as, the amount of time it took to write the data.
> Others may be able to give you more quantitative information.
>
>
> On Tue, Oct 22, 2013 at 12:56 PM, Ryan Fishel <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> I am currently running through the test suites included with the Accumulo
>> package ($ACCUMULO_HOME/test/system) and am running into some rather long
>> verification times with the Continuous Test.
>>
>> I am running the continuous test for a 24 hour period on a 7 node cluster
>> with walkers, batch walkers, and that stats service turned on.  All jobs
>> appear to run fine during the whole period. Since the test docs don't give
>> any indication, I was wondering if someone could provide typical run times
>> for the verification job? I'd like to appropriately set my expectations
>> before I start looking for a misconfiguration in the underlying cluster.
>>
>> Thank you!
>> Ryan Fishel
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB