Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> BatchWriter performance on 1.4


+
Slater, David M. 2013-09-18, 21:07
+
David Medinets 2013-09-19, 02:41
+
Slater, David M. 2013-09-19, 14:53
Copy link to this message
-
Re: BatchWriter performance on 1.4
Currently the addMutation() code is synchronized, so that is a bottle neck.
A thread would get around this, but then there's then you need to manage
the thread properly.
On Wed, Sep 18, 2013 at 5:07 PM, Slater, David M.
<[EMAIL PROTECTED]>wrote:

> Hi, I’m running a single-threaded ingestion program that takes data from
> an input source, parses it into mutations, and then writes those mutations
> (sequentially) to four different BatchWriters (all on different tables).
> Most of the time (95%) taken is on adding mutations, e.g.
> batchWriter.addMutations(mutations); I am wondering how to reduce the time
> taken by these methods. ****
>
> ** **
>
> 1) For the method batchWriter.addMutations(Iterable<Mutation>), does it
> matter for performance whether the mutations returned by the iterator are
> sorted in lexicographic order? ****
>
> ** **
>
> 2) If the Iterable<Mutation> that I pass to the BatchWriter is very large,
> will I need to wait for a number of Batches to be written and flushed
> before it will finish iterating, or does it transfer the elements of the
> Iterable to a different intermediate list?****
>
> ** **
>
> 3) If that is the case, would it then make sense to spawn off short
> threads for each time I make use of addMutations?****
>
> ** **
>
> At a high level, my code looks like this:****
>
> ** **
>
> BatchWriter bw1 = connector.createBatchWriter(…)****
>
> BatchWriter bw2 = …****
>
> …****
>
> while(true) {****
>
> String[] data = input.getData();****
>
> List<Mutation> mutations1 = parseData1(data);****
>
>                 List<Mutation> mutations2 = parseData2(data);****
>
>                 …****
>
>                 bw1.addMutations(mutations1);****
>
>                 bw2.addMutations(mutations2);****
>
>                 …****
>
> }****
>
> ****
>
> Thanks,
> David****
>
+
Adam Fuchs 2013-09-19, 08:07
+
Keith Turner 2013-09-19, 16:39
+
Slater, David M. 2013-09-19, 21:08
+
Keith Turner 2013-09-19, 23:01
+
Slater, David M. 2013-09-20, 16:47
+
John Vines 2013-09-20, 18:50
+
Keith Turner 2013-09-20, 18:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB