Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Improving ingest performance [SEC=UNCLASSIFIED]


+
Dickson, Matt MR 2013-07-24, 06:26
+
Eric Newton 2013-07-24, 12:35
+
Christopher 2013-07-25, 02:16
Copy link to this message
-
Re: Improving ingest performance [SEC=UNCLASSIFIED]
(5,000,000,000 records) x (~10 entries/record) /
((12 nodes) x (70 minutes) x (60 seconds/minute))

= ~100,000 entries/sec/node

This is consistent with other published results

On Wed, Jul 24, 2013 at 02:26:18AM -0400, Dickson, Matt MR wrote:
>    UNCLASSIFIED
>
>    Hi,
>
>    I'm trying to improve ingest performance on a 12 node test cluster.
>    Currently I'm loading 5 billion records in approximately 70 minutes which
>    seems excessive.  Monitoring the job there are 2600 map jobs (there is no
>    reduce stage, just the mapper) with 288 running at any one time.  The
>    performance seems slowest in the early stages of the job prior to to min
>    or maj compactions occuring.  Each server has 48 GB memory and currently
>    the accumulo settings are based on the 3GB settings in the example config
>    directory, ie tserver.memory.maps.max = 1GB, tserver.cache.index.site=50M
>    and tserver.cache.index.site=512M.  All other settings on the table are
>    default.
>
>    Questions.
>
>    1. What is Accumulo doing in the initial stage of a load and which
>    configurations should I focus on to improve this?
>    2. At what ingest rate should I consider using the bulk ingest process
>    with rfiles?
>
>    Thanks
>    Matt
>
>    IMPORTANT: This email remains the property of the Department of Defence
>    and is subject to the jurisdiction of section 70 of the Crimes Act 1914.
>    If you have received this email in error, you are requested to contact the
>    sender and delete the email.
+
William Slacum 2013-07-24, 15:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB