Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Increasing Ingest Rate


Copy link to this message
-
Re: Increasing Ingest Rate
Hopefully you are using accumulo 1.4.*3*.

A performance issue (ACCUMULO-1062) was found in 1.4.2 when a large number
of clients attempted to update a tablet concurrently.

-Eric
On Thu, Apr 4, 2013 at 3:26 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:

>
>
> On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
>
>> Have you pre-split your tablet to spread the load out to all the
>> machines?
>> Yes.  We are using splits from loading the whole dataset previously.
>> Does the data distribution match your splits?
>> Yes.  See above.
>> Is the ingest data already sorted (that is, it always writes to the last
>> tablet)?
>> No.  The data writes to multiple tablets concurrently.  We set up a queue
>> parameter and divide the data into multiple queues.
>> How much memory and how many threads are you using in your batchwriters?
>> I believe we have 16GB of memory for the Java writer with 18 threads
>> running per server.
>>
>> Check the ingest rates on tablet server monitor page and look for hot
>> spots.
>> There are certain servers that have higher ingest rates, and the server
>> that is busiest changes over time, but the overall ingestion rate will not
>> go up.
>>
>>
>
>>
>>
>> On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>> I am fairly new to Accumulo and am trying to figure out what is
>>> preventing my system from ingesting data at a faster rate. We have 15 nodes
>>> running a simple Java program that reads and writes to Accumulo and then
>>> indexes some data into Solr. The rate of ingest is not scaling linearly
>>> with the number of nodes that we start up. I have tried increasing several
>>> parameters including:
>>>  - limit of file descriptors in linux
>>> - max zookeeper connections
>>> - tserver.memory.maps.max
>>> - tserver_opts memory size
>>> - tserver.mutation_queue.max
>>> - tserver.scan.files.open.max
>>> - tserver.walog.max.size
>>> - tserver.cache.data.size
>>> - tserver.cache.index.size
>>> - hdfs setting for xceivers
>>> No matter what changes we make, we cannot get the ingest rate to go over
>>> 100k entries/s and about 6 Mb/s. I know Accumulo should be able to ingest
>>> faster than this.
>>>  Thanks in advance,
>>>
>>> Jimmy Lin
>>>
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB