-Re: Improving ingest performance [SEC=UNCLASSIFIED]
Jeremy Kepner 2013-07-24, 14:35
(5,000,000,000 records) x (~10 entries/record) /
((12 nodes) x (70 minutes) x (60 seconds/minute))
= ~100,000 entries/sec/node
This is consistent with other published results
On Wed, Jul 24, 2013 at 02:26:18AM -0400, Dickson, Matt MR wrote:
> I'm trying to improve ingest performance on a 12 node test cluster.
> Currently I'm loading 5 billion records in approximately 70 minutes which
> seems excessive. Monitoring the job there are 2600 map jobs (there is no
> reduce stage, just the mapper) with 288 running at any one time. The
> performance seems slowest in the early stages of the job prior to to min
> or maj compactions occuring. Each server has 48 GB memory and currently
> the accumulo settings are based on the 3GB settings in the example config
> directory, ie tserver.memory.maps.max = 1GB, tserver.cache.index.site=50M
> and tserver.cache.index.site=512M. All other settings on the table are
> 1. What is Accumulo doing in the initial stage of a load and which
> configurations should I focus on to improve this?
> 2. At what ingest rate should I consider using the bulk ingest process
> with rfiles?
> IMPORTANT: This email remains the property of the Department of Defence
> and is subject to the jurisdiction of section 70 of the Crimes Act 1914.
> If you have received this email in error, you are requested to contact the
> sender and delete the email.