Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - How to pre-split a table for UUID rowkeys

Copy link to this message
How to pre-split a table for UUID rowkeys
Terry P. 2013-08-02, 21:41
Greetings folks,
Have a bit of a non-typical Accumulo use case using Accumulo as a backend
data store for a search index to provide fault tolerance should the index
get corrupted.  Max docs stored in Accumulo will be under 1 billion at full

The search index is used to "find" the data a user is interested in, and
the search index then retrieves the document from Accumulo using its RowKey
which was gotten from the search index.  The RowKey is a java.util.UUID
string that has had the '-' dashes stripped out.

I have a 3 node cluster and as a quick test have ingested 5 million 1K
documents into it, yet they all went to a single TabletServer.  I was kind
of surprised -- I knew this would be the case for a row key using a
monotonically increasing number, but I thought with a UUID type rowkey the
entries would have been spread across the TabletServers at least some, even
without pre-splitting the table.

Clearly my understanding of how Accumulo spreads the data out is lacking.
 Can anyone shed more light on it?  And possibly recommend a table split
strategy for a 3-node cluster such as I have described?

Many thanks in advance,