Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Max tablet size & Pre-splitting


Copy link to this message
-
RE: Max tablet size & Pre-splitting
Slater, David M. 2013-10-28, 18:50
Thanks John, that helps. I checked Eric's reply as well, and I think I'm good.

From: John Vines [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 28, 2013 2:11 PM
To: [EMAIL PROTECTED]
Subject: Re: Max tablet size & Pre-splitting

There is no hardcoded maximum for file size in Accumulo, so the split threshhold is the only things that provides some sort of definition for tablet size. Please be aware, if you have giant rows, you can have a tablet that exceeds the split threshhold as well, hence me referring to it loosely as the defining characteristic.
As for tablet size, you can get that information from the !METADATA table, as one option. Eric Newton recently wrote a reply on this mailing list in the past 2 weeks, I think, which explained the entries there.

On Mon, Oct 28, 2013 at 1:59 PM, Slater, David M. <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
First, a quick question: For Accumulo 1.4.2, is there a maximum size that tablet can have? In other words, if I was to do something like table.split.threshold=1000G, would that actually allow the tablet to grow to that size, or is there some static maximum, like 2G that a tablet can have?

The reason I ask this is that I'm doing time-based presplitting of tables, so that I add a set of split points when I get to a new time range (or one of the tablets reach a certain size), and then transfer all of my ingest to the new set of tablets created. This keeps me from needing to do any table splits involving data. Therefore, I would like to set the table split threshold arbitrarily high, so that my presplitting algorithm can do all the work.

Second, is there a preferred way to estimate the tablet sizes from the Java API? I have the Ingestion application using my split points and mutation.numBytes() to keep track of the number of bytes per tablet. Should I be using mutation.memory() instead? Or is there a more direct way via connector.tableOperations() or some other mechanism to determine the size of the tablet?

Thanks,
David