Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Many locality groups


Copy link to this message
-
Re: Many locality groups
For those curious, I ran some quick benchmarks, scanning over all columns
(loc groups) does appear to take a slight hit as your grow the number of
locality groups, but it doesn't appear to be too painful:

With 1.5.0 (no in-memory maps partitioning): {4lgs => 4.5s, 16 lgs => 4.9s,
32lgs, 5.5s}
With 1.6.0 (in-memory maps partitioning): {4lgs => 5.2s, 16lgs => 6.1s,
32lgs => 7.6s}

This are massively hand-wavey benchmarks btw. Wimpy computer and results
weren't averaged over multiple runs. It doesn't raise too much concern to
me as I would assume that when many locality groups are set, scanning over
*all* columns wouldn't be a very common use case.
On Wed, Sep 18, 2013 at 12:01 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> Neat!
>
> Glad to see I wasn't completely off base with some of the complexity
> numbers I was expecting. I'll pick up my poking and prodding where you left
> off.
>
> Thanks, Keith.
>
>
> On Wed, Sep 18, 2013 at 11:35 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>> I ran some test before and after partitioning tablet memory in
>> ACCUMULO-112.  I commented on the performance numbers I saw.  I checked in
>> the code I used to test.
>>
>> test/src/main/java/org/apache/accumulo/test/IMMLGBenchmark.java
>>
>> Looking back at the test, one thing I did not time was reading all of the
>> locality groups in scan.
>>
>>
>> On Wed, Sep 18, 2013 at 11:02 AM, Josh Elser <[EMAIL PROTECTED]>
>> wrote:
>>
>> > I have a use case in which I'm investigating setting a locality group on
>> > every column family in a table which has very "dense" rows (many columns
>> > appear within the same tablet).
>> >
>> > When scanning over a single column, I see a slow-down as one might
>> expect
>> > (filtering out the columns I don't care about). Setting each column into
>> > its own locality group helps speed things up again for that single
>> column
>> > query case.
>> >
>> > I'm curious if anyone has any insight to when/if I'm going to start
>> paying
>> > a penalty for having many locality groups. Glancing back over
>> RFile.Reader,
>> > I have to read each LocalityGroupMetadata and its multi-level index
>> (which
>> > shouldn't be bad if I remember Keith's talks) and then I should get
>> log(n)
>> > reads across the locality groups I need to open.
>> >
>> > Is the same true for writing data to many a table with many locality
>> > groups? Nothing terrible pops out at me looking at the code.
>> >
>> > I was planning to write some tests to try and simulate this, but
>> figured I
>> > can poll the community as well to see if anyone has experimented in this
>> > scenario before.
>> >
>> > Thanks!
>> >
>> > - Josh
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB