Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - BatchWriter performance on 1.4


Copy link to this message
-
Re: BatchWriter performance on 1.4
John Vines 2013-09-20, 18:50
If you don't want it to wait a long time before writing, then set the
maxLatency lower. That is the entire reason for that setting.
On Fri, Sep 20, 2013 at 12:47 PM, Slater, David M.
<[EMAIL PROTECTED]>wrote:

> I was using flush() after sending a bunch of mutations to the batchwriters
> to limit their latency. I thought it would normally flush the buffer to
> ensure that the maxLatency is not violated. If the maxLatency is quite
> large, how do I ensure that it doesn’t wait a long time before writing? **
> **
>
> ** **
>
> If the returned batchscanners are all thread safe, then I’m still going to
> have the bottleneck of their synchronized addMutations method, correct?***
> *
>
> ** **
>
> I’m looking for “org.apache.accumulo.client.impl” in the
> log4j.properties, generic_logger.xml the and other config files, but can’t
> locate it. Do I need to create a new entry for it there?****
>
> ** **
>
> Thanks,
> David****
>
> ** **
>
> *From:* Keith Turner [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, September 19, 2013 7:01 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: BatchWriter performance on 1.4****
>
> ** **
>
> On Thu, Sep 19, 2013 at 5:08 PM, Slater, David M. <[EMAIL PROTECTED]>
> wrote:****
>
> Thanks Keith, I’m looking at it now. It appears like what I would want. As
> for the proper usage…****
>
>  ****
>
> Would I create one using the Connector, ****
>
> then .getBatchWriter() for each of the tables I’m interested in,****
>
> add data to each of BatchWriters returned,****
>
> ** **
>
> yes.****
>
>  ****
>
> and then hit flush() when I want to write all of that to get written?****
>
> ** **
>
> Why are you calling flush() ?   Doing this frequently will increase rpc
> overhead and lower throughput.****
>
>  ****
>
>  ****
>
> Would the individual batch writers spawned by the multiTableBatchWriter
> still have synchronized addMutations() methods so I would have to worry
> about blocking still, or would that all happen at the flush() method?****
>
> ** **
>
> The returned batch writers are thread safe. They all add to the same
> queue/buffer in a synchronized manner.   Calling flush() on any of the
> batch writers returned from getBatchWriter() will block the others.   ****
>
> ** **
>
> If you enable set the log4j log level to TRACE for
> org.apache.accumulo.client.impl you can see output like the following.
>  Binning is the process of taking each mutation and deciding which tablet
> and tablet server it goes to.****
>
> ** **
>
>   2013-09-19 18:43:37,261 [impl.ThriftTransportPool] TRACE: Using existing
> connection to 127.0.0.1:9997****
>
>   2013-09-19 18:43:37,393 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13
>  Binning 80909 mutations for table 3****
>
>   2013-09-19 18:43:37,402 [impl.TabletLocatorImpl] TRACE: tid=12 oid=13
>  Binned 80909 mutations for table 3 to 1 tservers in 0.009 secs****
>
>   2013-09-19 18:43:37,402 [impl.TabletServerBatchWriter] TRACE: Started
> sending 80,909 mutations to 1 tablet servers****
>
>   2013-09-19 18:43:37,656 [impl.ThriftTransportPool] TRACE: Returned
> connection 127.0.0.1:9997 (120000) ioCount : 1459116****
>
>   2013-09-19 18:43:37,657 [impl.TabletServerBatchWriter] TRACE: sent
> 80,909 mutations to 127.0.0.1:9997 in 0.40 secs (204,832.91
> mutations/sec) with 0 failures****
>
> ** **
>
> When you close the batch writer, it will log some summary stats like the
> following.   ****
>
> ** **
>
> ** **
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: ****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: TABLET
> SERVER BATCH WRITER STATISTICS****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Added
>            :  1,000,000 mutations****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Sent
>           :  1,000,000 mutations****
>
>   2013-09-19 18:43:39,149 [impl.TabletServerBatchWriter] TRACE: Resent
> percentage   :       0.00%****
>
>   2013-09-19 18:43:39,150 [impl.TabletServerBatchWriter] TRACE: Overall