Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> BatchWriter performance on 1.4


+
Slater, David M. 2013-09-18, 21:07
+
David Medinets 2013-09-19, 02:41
+
Slater, David M. 2013-09-19, 14:53
+
John Vines 2013-09-18, 21:22
+
Adam Fuchs 2013-09-19, 08:07
+
Keith Turner 2013-09-19, 16:39
+
Slater, David M. 2013-09-19, 21:08
+
Keith Turner 2013-09-19, 23:01
+
Slater, David M. 2013-09-20, 16:47
+
John Vines 2013-09-20, 18:50
Copy link to this message
-
Re: BatchWriter performance on 1.4
On Fri, Sep 20, 2013 at 12:47 PM, Slater, David M.
<[EMAIL PROTECTED]>wrote:

> I was using flush() after sending a bunch of mutations to the batchwriters
> to limit their latency. I thought it would normally flush the buffer to
> ensure that the maxLatency is not violated. If the maxLatency is quite
> large, how do I ensure that it doesn’t wait a long time before writing?
>

If you are constantly writing a batch writer, then it will be continually
flushing.   The example debug output I posted was from running
org.apache.accumulo.test.TestIngest (may be in a another package before
1.6).  I ran the following command to write a million random mutations.

accumulo org.apache.accumulo.test.TestIngest --debug -u root -p secret
--timestamp 1 --size 50 --random 56 --rows 1000000 --start 0 --cols 1

I think it defaults to 50M of memory for the batch writer.  It was
continually sending batches of 80K mutations every .45 seconds.   So in
that case the latency of a mutation is probably less than two seconds. But
this is just one tablet server, the behavior would be different on multiple
tablet servers.

In this example if I set the max latency on the batch writer to 30 secs,
then it would never kick in and force a flush.

> ****
>
> ** **
>
> If the returned batchscanners are all thread safe, then I’m still going to
> have the bottleneck of their synchronized addMutations method, correct?
>

In my experience, thats not a bottle neck but you will need to confirm this
for your situation (hopefully the debug output can help you w/ this).   If
the M threads adding mutations to a queue are going at a faster rate than
the N threads taking mutation and sending them, then the in synchronization
around the queue is not the bottleneck.  M threads probably could add to a
synchronized queue at a rate of millions of mutations per second.  N
threads can probably only serialize and send tens or hundreds of thousands
of mutations per second.
> ****
>
> ** **
>
> I’m looking for “org.apache.accumulo.client.impl” in the
> log4j.properties, generic_logger.xml the and other config files, but can’t
> locate it. Do I need to create a new entry for it there?
>

You can add something to a log4j.props file thats on the class path or you
can try adding something like the following to your code.  I had the
package wrong, its correct below.

Logger.getLogger("org.apache.accumulo.core.client.impl").setLevel(Level.TRACE)

****
>
> ** **
>
> Thanks,
> David****
>
> ** **
>
> *From:* Keith Turner [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, September 19, 2013 7:01 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: BatchWriter performance on 1.4****
>
> ** **
>
> On Thu, Sep 19, 2013 at 5:08 PM, Slater, David M. <[EMAIL PROTECTED]>
> wrote:****
>
> Thanks Keith, I’m looking at it now. It appears like what I would want. As
> for the proper usage…****
>
>  ****
>
> Would I create one using the Connector, ****
>
> then .getBatchWriter() for each of the tables I’m interested in,****
>
> add data to each of BatchWriters returned,****
>
> ** **
>
> yes.****
>
>  ****
>
> and then hit flush() when I want to write all of that to get written?****
>
> ** **
>
> Why are you calling flush() ?   Doing this frequently will increase rpc
> overhead and lower throughput.****
>
>  ****
>
>  ****
>
> Would the individual batch writers spawned by the multiTableBatchWriter
> still have synchronized addMutations() methods so I would have to worry
> about blocking still, or would that all happen at the flush() method?****
>
> ** **
>
> The returned batch writers are thread safe. They all add to the same
> queue/buffer in a synchronized manner.   Calling flush() on any of the
> batch writers returned from getBatchWriter() will block the others.   ****
>
> ** **
>
> If you enable set the log4j log level to TRACE for
> org.apache.accumulo.client.impl you can see output like the following.
>  Binning is the process of taking each mutation and deciding which tablet
> and tablet server it goes to.****
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB