Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Essential column family performance


Copy link to this message
-
Re: Essential column family performance
Fix is committed and will be in 0.94.7.

I guess we should have a discussion at some point on whether we should always switch this feature on (it is disabled by default), as we now can no longer find any case where enabling it is slower.

-- Lars

________________________________
 From: Anoop Sam John <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Tuesday, April 9, 2013 10:30 PM
Subject: RE: Essential column family performance
 
Good finding Lars & team  :)

-Anoop-
________________________________________
From: lars hofhansl [[EMAIL PROTECTED]]
Sent: Wednesday, April 10, 2013 9:46 AM
To: [EMAIL PROTECTED]
Subject: Re: Essential column family performance

That part did not show up in the profiling session.
It was just the unnecessary seek that slowed it all down.

-- Lars

________________________________
From: Ted Yu <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, April 9, 2013 9:03 PM
Subject: Re: Essential column family performance

Looking at populateFromJoinedHeap():

      KeyValue kv = populateResult(results, this.joinedHeap, limit,

          joinedContinuationRow.getBuffer(), joinedContinuationRow
.getRowOffset(),

          joinedContinuationRow.getRowLength(), metric);

...

      Collections.sort(results, comparator);

Arrays.mergeSort() is used in the Collections.sort() call.

There seems to be some optimization we can do above: we can record the size
of results before calling populateResult(). Upon return, we can merge the
two segments without resorting to Arrays.mergeSort() which is recursive.
On Tue, Apr 9, 2013 at 6:21 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. with only 10000 rows that would all fit in the memstore.
>
> This aspect should be enhanced in the test.
>
> Cheers
>
> On Tue, Apr 9, 2013 at 6:17 PM, Lars Hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Also the unittest tests with only 10000 rows that would all fit in the
>> memstore. Seek vs reseek should make little difference for the memstore.
>>
>> We tested with 1m and 10m rows, and flushed the memstore  and compacted
>> the store.
>>
>> Will do some more verification later tonight.
>>
>> -- Lars
>>
>>
>> Lars H <[EMAIL PROTECTED]> wrote:
>>
>> >Your slow scanner performance seems to vary as well. How come? Slow is
>> with the feature off.
>> >
>> >I don't how reseek can be slower than seek in any scenario.
>> >
>> >-- Lars
>> >
>> >Ted Yu <[EMAIL PROTECTED]> schrieb:
>> >
>> >>I tried using reseek() as suggested, along with my patch from
>> HBASE-8306 (30%
>> >>selection rate, random distribution and FAST_DIFF encoding on both
>> column
>> >>families).
>> >>I got uneven results:
>> >>
>> >>2013-04-09 16:59:01,324 INFO  [main]
>> regionserver.TestJoinedScanners(167):
>> >>Slow scanner finished in 7.529083 seconds, got 1546 rows
>> >>
>> >>2013-04-09 16:59:06,760 INFO  [main]
>> regionserver.TestJoinedScanners(167):
>> >>Joined scanner finished in 5.43579 seconds, got 1546 rows
>> >>...
>> >>2013-04-09 16:59:12,711 INFO  [main]
>> regionserver.TestJoinedScanners(167):
>> >>Slow scanner finished in 5.95016 seconds, got 1546 rows
>> >>
>> >>2013-04-09 16:59:20,240 INFO  [main]
>> regionserver.TestJoinedScanners(167):
>> >>Joined scanner finished in 7.529044 seconds, got 1546 rows
>> >>
>> >>FYI
>> >>
>> >>On Tue, Apr 9, 2013 at 4:47 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >>
>> >>> We did some tests here.
>> >>> I ran this through the profiler against a local RegionServer and
>> found the
>> >>> part that causes the slowdown is a seek called here:
>> >>>              boolean mayHaveData >> >>>               (nextJoinedKv != null &&
>> >>> nextJoinedKv.matchingRow(currentRow, offset, length))
>> >>>               ||
>> >>> (this.joinedHeap.seek(KeyValue.createFirstOnRow(currentRow, offset,
>> length))
>> >>>                   && joinedHeap.peek() != null
>> >>>                   && joinedHeap.peek().matchingRow(currentRow, offset,
>> >>> length));
>> >>>