-Re: How to pre-split a table for UUID rowkeys
Eric Newton 2013-08-02, 22:35
Apparently 5M 1K documents isn't enough to split the tablet. I'm guessing
that your documents are compressing well, or you are able to fit them all
in memory. You could try flushing the table and see if it splits.
shell > flush -t table -w
Or, you could just add splits if you know the UUIDs are uniformly
shell > addsplits -t table 1 2 3 4 5 6 7 8 9 a b c d e f
Or, if you just want accumulo to split at a certain size under the 1G
shell > config -t table -s table.split.threshold=10M
On Fri, Aug 2, 2013 at 5:41 PM, Terry P. <[EMAIL PROTECTED]> wrote:
> Greetings folks,
> Have a bit of a non-typical Accumulo use case using Accumulo as a backend
> data store for a search index to provide fault tolerance should the index
> get corrupted. Max docs stored in Accumulo will be under 1 billion at full
> The search index is used to "find" the data a user is interested in, and
> the search index then retrieves the document from Accumulo using its RowKey
> which was gotten from the search index. The RowKey is a java.util.UUID
> string that has had the '-' dashes stripped out.
> I have a 3 node cluster and as a quick test have ingested 5 million 1K
> documents into it, yet they all went to a single TabletServer. I was kind
> of surprised -- I knew this would be the case for a row key using a
> monotonically increasing number, but I thought with a UUID type rowkey the
> entries would have been spread across the TabletServers at least some, even
> without pre-splitting the table.
> Clearly my understanding of how Accumulo spreads the data out is lacking.
> Can anyone shed more light on it? And possibly recommend a table split
> strategy for a 3-node cluster such as I have described?
> Many thanks in advance,