Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Best practices in sizing values?


Copy link to this message
-
Re: Best practices in sizing values?
Christopher 2013-06-09, 21:08
At the very least, I would keep it under the size of your compressed
data blocks in your RFiles (this may mean you should increase value of
table.file.compress.blocksize to be larger than the default of 100K).

You could also tweak this according to your application. Say, for
example, you wanted to limit the additional work to resolve the
pointer and retrieve from HDFS only 5% of the time, you could sample
your data, and choose a cutoff value that keeps 95% of your data in
the Accumulo table.

Personally, I like to keep things under 1MB in the value, and under 1K
in the key, as a crude rule of thumb, but it very much depends on the
application.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Sun, Jun 9, 2013 at 4:37 PM, Frank Smith <[EMAIL PROTECTED]> wrote:
> I have an application where I have a block of unstructured text.  Normally
> that text is relatively small <500k, but there are conditions where it can
> be up to GBs of text.
>
> I was considering of using a threshold where I simply decide to change from
> storing the text in the value of my mutation, and just add a reference to
> the HDFS location, but I wanted to get some advice on where that threshold
> should (best practice) or must (system limitation) be?
>
> Also, can I stream data into a value, vice passing a byte array?  Similar to
> how CLOBs and BLOBs are handled in an RDBMS.
>
> Thanks,
>
> Frank