Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> BatchWriter performance on 1.4


Copy link to this message
-
Re: BatchWriter performance on 1.4
If you don't want it to wait a long time before writing, then set the
maxLatency lower. That is the entire reason for that setting.
On Fri, Sep 20, 2013 at 12:47 PM, Slater, David M.
<[EMAIL PROTECTED]>wrote:

> I was using flush() after sending a bunch of mutations to the batchwriters
> to limit their latency. I thought it would normally flush the buffer to
> ensure that the maxLatency is not violated. If the maxLatency is quite
> large, how do I ensure that it doesn’t wait a long time before writing? **
> **
>
> ** **
>
> If the returned batchscanners are all thread safe, then I’m still going to
> have the bottleneck of their synchronized addMutations method, correct?***
> *
>
> ** **
>
> I’m looking for “org.apache.accumulo.client.impl” in the
> log4j.properties, generic_logger.xml the and other config files, but can’t
> locate it. Do I need to create a new entry for it there?****
>
> ** **
>
> Thanks,
> David****
>
> ** **
>
> *From:* Keith Turner [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, September 19, 2013 7:01 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: BatchWriter performance on 1.4****
>
> ** **
>
> On Thu, Sep 19, 2013 at 5:08 PM, Slater, David M. <[EMAIL PROTECTED]>
> wrote:****
>
> Thanks Keith, I’m looking at it now. It appears like what I would want. As
> for the proper usage…****
>
>  ****
>
> Would I create one using the Connector, ****
>
> then .getBatchWriter() for each of the tables I’m interested in,****
>
> add data to each of BatchWriters returned,****
>
> ** **
>
> yes.****
>
>  ****
>
> and then hit flush() when I want to write all of that to get written?****
>
> ** **
>
> Why are you calling flush() ?   Doing this frequently will increase rpc
> overhead and lower throughput.****
>
>  ****
>
>  ****
>
> Would the individual batch writers spawned by the multiTableBatchWriter
> still have synchronized addMutations() methods so I would have to worry
> about blocking still, or would that all happen at the flush() method?****
>
> ** **
>
> The returned batch writers are thread safe. They all add to the same
> queue/buffer in a synchronized manner.   Calling flush() on any of the
> batch writers returned from getBatchWriter() will block the others.   ****
>
> ** **
>
> If you enable set the log4j log level to TRACE for
> org.apache.accumulo.client.impl you can see output like the following.
>  Binning is the process of taking each mutation and deciding which tablet
> and tablet server it goes to.****
>
> ** **
>
>   2013-09-19 18:43:37,261 [impl.ThriftTransportPool] TRACE: Using existing
> connection to 127.0.0.1:9997****
>
>   2013-09-19 18:43:37,393 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13
>  Binning 80909 mutations for table 3****
>
>   2013-09-19 18:43:37,402 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13
>  Binned 80909 mutations for table 3 to 1 tservers in 0.009 secs****
>
>   2013-09-19 18:43:37,402 [impl.TabletServerBatchWriter] TRACE: Started
> sending 80,909 mutations to 1 tablet servers****
>
>   2013-09-19 18:43:37,656 [impl.ThriftTransportPool] TRACE: Returned
> connection 127.0.0.1:9997 (120000) ioCount : 1459116****
>
>   2013-09-19 18:43:37,657 [impl.TabletServerBatchWriter] TRACE: sent
> 80,909 mutations to 127.0.0.1:9997 in 0.40 secs (204,832.91
> mutations/sec) with 0 failures****
>
> ** **
>
> When you close the batch writer, it will log some summary stats like the
> following.   ****
>
> ** **
>
> ** **
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: ****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: TABLET
> SERVER BATCH WRITER STATISTICS****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Added
>            :  1,000,000 mutations****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Sent
>           :  1,000,000 mutations****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Resent
> percentage   :       0.00%****
>
>   2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Overall
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB