Have a bit of a non-typical Accumulo use case using Accumulo as a backend
data store for a search index to provide fault tolerance should the index
get corrupted. Max docs stored in Accumulo will be under 1 billion at full
The search index is used to "find" the data a user is interested in, and
the search index then retrieves the document from Accumulo using its RowKey
which was gotten from the search index. The RowKey is a java.util.UUID
string that has had the '-' dashes stripped out.
I have a 3 node cluster and as a quick test have ingested 5 million 1K
documents into it, yet they all went to a single TabletServer. I was kind
of surprised -- I knew this would be the case for a row key using a
monotonically increasing number, but I thought with a UUID type rowkey the
entries would have been spread across the TabletServers at least some, even
without pre-splitting the table.
Clearly my understanding of how Accumulo spreads the data out is lacking.
Can anyone shed more light on it? And possibly recommend a table split
strategy for a 3-node cluster such as I have described?
Many thanks in advance,