Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Increasing Ingest Rate

Copy link to this message
Re: Increasing Ingest Rate
Hopefully you are using accumulo 1.4.*3*.

A performance issue (ACCUMULO-1062) was found in 1.4.2 when a large number
of clients attempted to update a tablet concurrently.

On Thu, Apr 4, 2013 at 3:26 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
>> Have you pre-split your tablet to spread the load out to all the
>> machines?
>> Yes.  We are using splits from loading the whole dataset previously.
>> Does the data distribution match your splits?
>> Yes.  See above.
>> Is the ingest data already sorted (that is, it always writes to the last
>> tablet)?
>> No.  The data writes to multiple tablets concurrently.  We set up a queue
>> parameter and divide the data into multiple queues.
>> How much memory and how many threads are you using in your batchwriters?
>> I believe we have 16GB of memory for the Java writer with 18 threads
>> running per server.
>> Check the ingest rates on tablet server monitor page and look for hot
>> spots.
>> There are certain servers that have higher ingest rates, and the server
>> that is busiest changes over time, but the overall ingestion rate will not
>> go up.
>> On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:
>>> Hello,
>>> I am fairly new to Accumulo and am trying to figure out what is
>>> preventing my system from ingesting data at a faster rate. We have 15 nodes
>>> running a simple Java program that reads and writes to Accumulo and then
>>> indexes some data into Solr. The rate of ingest is not scaling linearly
>>> with the number of nodes that we start up. I have tried increasing several
>>> parameters including:
>>>  - limit of file descriptors in linux
>>> - max zookeeper connections
>>> - tserver.memory.maps.max
>>> - tserver_opts memory size
>>> - tserver.mutation_queue.max
>>> - tserver.scan.files.open.max
>>> - tserver.walog.max.size
>>> - tserver.cache.data.size
>>> - tserver.cache.index.size
>>> - hdfs setting for xceivers
>>> No matter what changes we make, we cannot get the ingest rate to go over
>>> 100k entries/s and about 6 Mb/s. I know Accumulo should be able to ingest
>>> faster than this.
>>>  Thanks in advance,
>>> Jimmy Lin