Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Re: Document-Partitioned Indexing - Optimizing Mutation Size


Copy link to this message
-
Re: Document-Partitioned Indexing - Optimizing Mutation Size
Its possible that it could run faster.   There are two things that could
enable this.

If you are using the native map, then its structured as map<row, map<col,
val>>.  For a mutation w/ multiple columns, it will lookup the row once to
get map<col, val>.  After that it will do inserts into the column map
directly.  I am not sure this will help much in your case since you
probably have a shallow row tree and deep columns trees.

Second the row is only sent once over the wire and only written once to the
walog.  You may see some benefit here.

There is a simple test that ships w/ Accumulo in
test/system/test_ingest.sh.  This test writes 5 million mutations w/ one
column.   I ran the test varying the number of rows and columns keeping
row*columns==5M.    I used 1.6.0 w/ tserver.mutation.queue.max=4M.  I ran
these test on my workstation, so the walog was not written across a network.

5 million mutations, 1 col per mutation : ~26 secs
500,000 mutations, 10 col per mutation : ~16 secs
500 mutations, 10,000 col per mutation: ~13 secs

It might be worth experimenting with.

Keith

On Thu, May 15, 2014 at 10:53 AM, Slater, David M.
<[EMAIL PROTECTED]>wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB