Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> How to pre-split a table for UUID rowkeys


Copy link to this message
-
How to pre-split a table for UUID rowkeys
Greetings folks,
Have a bit of a non-typical Accumulo use case using Accumulo as a backend
data store for a search index to provide fault tolerance should the index
get corrupted.  Max docs stored in Accumulo will be under 1 billion at full
volume.

The search index is used to "find" the data a user is interested in, and
the search index then retrieves the document from Accumulo using its RowKey
which was gotten from the search index.  The RowKey is a java.util.UUID
string that has had the '-' dashes stripped out.

I have a 3 node cluster and as a quick test have ingested 5 million 1K
documents into it, yet they all went to a single TabletServer.  I was kind
of surprised -- I knew this would be the case for a row key using a
monotonically increasing number, but I thought with a UUID type rowkey the
entries would have been spread across the TabletServers at least some, even
without pre-splitting the table.

Clearly my understanding of how Accumulo spreads the data out is lacking.
 Can anyone shed more light on it?  And possibly recommend a table split
strategy for a 3-node cluster such as I have described?

Many thanks in advance,
Terry
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB