Ignoring the actual size constraint necessary (I'm not entirely sure how
that all adds up; it would be affected by concurrent query load and many
other things), placing the large chunk into the Key will affect the size
of the index inside of RFile (the file construct actually backing the
data in your table). This will increase your access times just to find
the offset in the file for the Key you're looking for.
Putting a chunk number in the Key and the actual data in the Value will
probably net you much better results. Chunking into 128M should work
with a 3G heap; however, I'd err on the cautious side and make many
smaller chunks instead of few very large chunks.
On 4/1/13 10:33 AM, David Medinets wrote:
> I have a chunk of data (let's say 400M) that I want to store in
> Accumulo. I can store the chunk in the ColumnFamily or in the Value.
> Does it make any difference to Accumulo which is used?
> My tserver is setup to use -Xmx3g. What is the largest size that seems
> to work? I have much more that I can allocate.
> Or should I focus on breaking the data into smaller pieces ... say
> 128M each?