Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Many locality groups

Copy link to this message
Many locality groups
I have a use case in which I'm investigating setting a locality group on
every column family in a table which has very "dense" rows (many columns
appear within the same tablet).

When scanning over a single column, I see a slow-down as one might expect
(filtering out the columns I don't care about). Setting each column into
its own locality group helps speed things up again for that single column
query case.

I'm curious if anyone has any insight to when/if I'm going to start paying
a penalty for having many locality groups. Glancing back over RFile.Reader,
I have to read each LocalityGroupMetadata and its multi-level index (which
shouldn't be bad if I remember Keith's talks) and then I should get log(n)
reads across the locality groups I need to open.

Is the same true for writing data to many a table with many locality
groups? Nothing terrible pops out at me looking at the code.

I was planning to write some tests to try and simulate this, but figured I
can poll the community as well to see if anyone has experimented in this
scenario before.


- Josh