Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Accumulo Continuous Testing - Verification MR Job Performance


Copy link to this message
-
Re: Accumulo Continuous Testing - Verification MR Job Performance
Eric Newton 2013-10-23, 00:16
The verification job is very sensitive to the number of rounds it
takes to shuffle/sort the results.  How many reducers have you used,
and how much memory have you given them?  More is better.

I think we've clocked the verification job for 24 hours of ingest in
under 2 hours.  This is from memory, so I could be wrong.  But with a
bad configuration (uses only a few small reducers), it can take a very
long time.

Go with as many as 100 reducers per node and let the reducers have a
lot of memory.  You want each reducer to run long enough to make the
process creation overhead small.  So they should run for a few
minutes, each.

Please post back with any improvements!

We are about to enter a testing cycle, so I'll update the example
configuration files with some better instructions.

I'm curious, how many key/value entries did you ingest in 24 hours?

-Eric

On Tue, Oct 22, 2013 at 4:56 PM, Billie Rinaldi
<[EMAIL PROTECTED]> wrote:
> I believe it does take a long time to verify.  Shorter than, but a similar
> order of magnitude as, the amount of time it took to write the data.
> Others may be able to give you more quantitative information.
>
>
> On Tue, Oct 22, 2013 at 12:56 PM, Ryan Fishel <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> I am currently running through the test suites included with the Accumulo
>> package ($ACCUMULO_HOME/test/system) and am running into some rather long
>> verification times with the Continuous Test.
>>
>> I am running the continuous test for a 24 hour period on a 7 node cluster
>> with walkers, batch walkers, and that stats service turned on.  All jobs
>> appear to run fine during the whole period. Since the test docs don't give
>> any indication, I was wondering if someone could provide typical run times
>> for the verification job? I'd like to appropriately set my expectations
>> before I start looking for a misconfiguration in the underlying cluster.
>>
>> Thank you!
>> Ryan Fishel
>>