Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Many locality groups


+
Josh Elser 2013-09-18, 15:02
+
Keith Turner 2013-09-18, 15:35
+
Josh Elser 2013-09-18, 16:01
Copy link to this message
-
Re: Many locality groups
For those curious, I ran some quick benchmarks, scanning over all columns
(loc groups) does appear to take a slight hit as your grow the number of
locality groups, but it doesn't appear to be too painful:

With 1.5.0 (no in-memory maps partitioning): {4lgs => 4.5s, 16 lgs => 4.9s,
32lgs, 5.5s}
With 1.6.0 (in-memory maps partitioning): {4lgs => 5.2s, 16lgs => 6.1s,
32lgs => 7.6s}

This are massively hand-wavey benchmarks btw. Wimpy computer and results
weren't averaged over multiple runs. It doesn't raise too much concern to
me as I would assume that when many locality groups are set, scanning over
*all* columns wouldn't be a very common use case.
On Wed, Sep 18, 2013 at 12:01 PM, Josh Elser <[EMAIL PROTECTED]> wrote:

> Neat!
>
> Glad to see I wasn't completely off base with some of the complexity
> numbers I was expecting. I'll pick up my poking and prodding where you left
> off.
>
> Thanks, Keith.
>
>
> On Wed, Sep 18, 2013 at 11:35 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>
>> I ran some test before and after partitioning tablet memory in
>> ACCUMULO-112.  I commented on the performance numbers I saw.  I checked in
>> the code I used to test.
>>
>> test/src/main/java/org/apache/accumulo/test/IMMLGBenchmark.java
>>
>> Looking back at the test, one thing I did not time was reading all of the
>> locality groups in scan.
>>
>>
>> On Wed, Sep 18, 2013 at 11:02 AM, Josh Elser <[EMAIL PROTECTED]>
>> wrote:
>>
>> > I have a use case in which I'm investigating setting a locality group on
>> > every column family in a table which has very "dense" rows (many columns
>> > appear within the same tablet).
>> >
>> > When scanning over a single column, I see a slow-down as one might
>> expect
>> > (filtering out the columns I don't care about). Setting each column into
>> > its own locality group helps speed things up again for that single
>> column
>> > query case.
>> >
>> > I'm curious if anyone has any insight to when/if I'm going to start
>> paying
>> > a penalty for having many locality groups. Glancing back over
>> RFile.Reader,
>> > I have to read each LocalityGroupMetadata and its multi-level index
>> (which
>> > shouldn't be bad if I remember Keith's talks) and then I should get
>> log(n)
>> > reads across the locality groups I need to open.
>> >
>> > Is the same true for writing data to many a table with many locality
>> > groups? Nothing terrible pops out at me looking at the code.
>> >
>> > I was planning to write some tests to try and simulate this, but
>> figured I
>> > can poll the community as well to see if anyone has experimented in this
>> > scenario before.
>> >
>> > Thanks!
>> >
>> > - Josh
>> >
>>
>
>