Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Many locality groups


Copy link to this message
-
Many locality groups
I have a use case in which I'm investigating setting a locality group on
every column family in a table which has very "dense" rows (many columns
appear within the same tablet).

When scanning over a single column, I see a slow-down as one might expect
(filtering out the columns I don't care about). Setting each column into
its own locality group helps speed things up again for that single column
query case.

I'm curious if anyone has any insight to when/if I'm going to start paying
a penalty for having many locality groups. Glancing back over RFile.Reader,
I have to read each LocalityGroupMetadata and its multi-level index (which
shouldn't be bad if I remember Keith's talks) and then I should get log(n)
reads across the locality groups I need to open.

Is the same true for writing data to many a table with many locality
groups? Nothing terrible pops out at me looking at the code.

I was planning to write some tests to try and simulate this, but figured I
can poll the community as well to see if anyone has experimented in this
scenario before.

Thanks!

- Josh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB