Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Improving ingest performance [SEC=UNCLASSIFIED]

Copy link to this message
Re: Improving ingest performance [SEC=UNCLASSIFIED]
(5,000,000,000 records) x (~10 entries/record) /
((12 nodes) x (70 minutes) x (60 seconds/minute))

= ~100,000 entries/sec/node

This is consistent with other published results

On Wed, Jul 24, 2013 at 02:26:18AM -0400, Dickson, Matt MR wrote:
>    Hi,
>    I'm trying to improve ingest performance on a 12 node test cluster.
>    Currently I'm loading 5 billion records in approximately 70 minutes which
>    seems excessive.  Monitoring the job there are 2600 map jobs (there is no
>    reduce stage, just the mapper) with 288 running at any one time.  The
>    performance seems slowest in the early stages of the job prior to to min
>    or maj compactions occuring.  Each server has 48 GB memory and currently
>    the accumulo settings are based on the 3GB settings in the example config
>    directory, ie tserver.memory.maps.max = 1GB, tserver.cache.index.site=50M
>    and tserver.cache.index.site=512M.  All other settings on the table are
>    default.
>    Questions.
>    1. What is Accumulo doing in the initial stage of a load and which
>    configurations should I focus on to improve this?
>    2. At what ingest rate should I consider using the bulk ingest process
>    with rfiles?
>    Thanks
>    Matt
>    IMPORTANT: This email remains the property of the Department of Defence
>    and is subject to the jurisdiction of section 70 of the Crimes Act 1914.
>    If you have received this email in error, you are requested to contact the
>    sender and delete the email.