Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Increasing Ingest Rate


+
Jimmy Lin 2013-04-04, 18:01
+
Eric Newton 2013-04-04, 18:25
+
Jimmy Lin 2013-04-04, 19:26
Copy link to this message
-
Re: Increasing Ingest Rate
Aaron Cordova 2013-04-04, 21:22
How many clients are you using to write?

Also the BatchWriter parameters might have an effect too - typically people use values like the following:

BatchWriter writer = connector.createBatchWriter(tableName, 1000000, 1000, 10)

Those numbers are

1000000 : max bytes per batch
1000 : max latency in milliseconds
10 : threads to use

What's the max ingest rate of a single server?
On Apr 4, 2013, at 3:26 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:

>
>
> On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
> Have you pre-split your tablet to spread the load out to all the machines?
> Yes.  We are using splits from loading the whole dataset previously.
> Does the data distribution match your splits?
> Yes.  See above.
> Is the ingest data already sorted (that is, it always writes to the last tablet)?
> No.  The data writes to multiple tablets concurrently.  We set up a queue parameter and divide the data into multiple queues.
> How much memory and how many threads are you using in your batchwriters?
> I believe we have 16GB of memory for the Java writer with 18 threads running per server.
>
> Check the ingest rates on tablet server monitor page and look for hot spots.
> There are certain servers that have higher ingest rates, and the server that is busiest changes over time, but the overall ingestion rate will not go up.
>  
>  
>
>
> On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:
> Hello,
> I am fairly new to Accumulo and am trying to figure out what is preventing my system from ingesting data at a faster rate. We have 15 nodes running a simple Java program that reads and writes to Accumulo and then indexes some data into Solr. The rate of ingest is not scaling linearly with the number of nodes that we start up. I have tried increasing several parameters including:
> - limit of file descriptors in linux
> - max zookeeper connections
> - tserver.memory.maps.max
> - tserver_opts memory size
> - tserver.mutation_queue.max
> - tserver.scan.files.open.max
> - tserver.walog.max.size
> - tserver.cache.data.size
> - tserver.cache.index.size
> - hdfs setting for xceivers
> No matter what changes we make, we cannot get the ingest rate to go over 100k entries/s and about 6 Mb/s. I know Accumulo should be able to ingest faster than this.
> Thanks in advance,
>  
> Jimmy Lin
>  
>
>

+
Eric Newton 2013-04-08, 13:31