Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Essential column family performance


Copy link to this message
-
Re: Essential column family performance
Turn it on by default in trunk/0.95 I'd say.
St.Ack
On Wed, Apr 10, 2013 at 4:02 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Fix is committed and will be in 0.94.7.
>
> I guess we should have a discussion at some point on whether we should
> always switch this feature on (it is disabled by default), as we now can no
> longer find any case where enabling it is slower.
>
> -- Lars
>
>
>
> ________________________________
>  From: Anoop Sam John <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Sent: Tuesday, April 9, 2013 10:30 PM
> Subject: RE: Essential column family performance
>
> Good finding Lars & team  :)
>
> -Anoop-
> ________________________________________
> From: lars hofhansl [[EMAIL PROTECTED]]
> Sent: Wednesday, April 10, 2013 9:46 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Essential column family performance
>
> That part did not show up in the profiling session.
> It was just the unnecessary seek that slowed it all down.
>
> -- Lars
>
>
>
> ________________________________
> From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, April 9, 2013 9:03 PM
> Subject: Re: Essential column family performance
>
> Looking at populateFromJoinedHeap():
>
>       KeyValue kv = populateResult(results, this.joinedHeap, limit,
>
>           joinedContinuationRow.getBuffer(), joinedContinuationRow
> .getRowOffset(),
>
>           joinedContinuationRow.getRowLength(), metric);
>
> ...
>
>       Collections.sort(results, comparator);
>
> Arrays.mergeSort() is used in the Collections.sort() call.
>
> There seems to be some optimization we can do above: we can record the size
> of results before calling populateResult(). Upon return, we can merge the
> two segments without resorting to Arrays.mergeSort() which is recursive.
>
>
> On Tue, Apr 9, 2013 at 6:21 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. with only 10000 rows that would all fit in the memstore.
> >
> > This aspect should be enhanced in the test.
> >
> > Cheers
> >
> > On Tue, Apr 9, 2013 at 6:17 PM, Lars Hofhansl <[EMAIL PROTECTED]>
> wrote:
> >
> >> Also the unittest tests with only 10000 rows that would all fit in the
> >> memstore. Seek vs reseek should make little difference for the memstore.
> >>
> >> We tested with 1m and 10m rows, and flushed the memstore  and compacted
> >> the store.
> >>
> >> Will do some more verification later tonight.
> >>
> >> -- Lars
> >>
> >>
> >> Lars H <[EMAIL PROTECTED]> wrote:
> >>
> >> >Your slow scanner performance seems to vary as well. How come? Slow is
> >> with the feature off.
> >> >
> >> >I don't how reseek can be slower than seek in any scenario.
> >> >
> >> >-- Lars
> >> >
> >> >Ted Yu <[EMAIL PROTECTED]> schrieb:
> >> >
> >> >>I tried using reseek() as suggested, along with my patch from
> >> HBASE-8306 (30%
> >> >>selection rate, random distribution and FAST_DIFF encoding on both
> >> column
> >> >>families).
> >> >>I got uneven results:
> >> >>
> >> >>2013-04-09 16:59:01,324 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Slow scanner finished in 7.529083 seconds, got 1546 rows
> >> >>
> >> >>2013-04-09 16:59:06,760 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Joined scanner finished in 5.43579 seconds, got 1546 rows
> >> >>...
> >> >>2013-04-09 16:59:12,711 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Slow scanner finished in 5.95016 seconds, got 1546 rows
> >> >>
> >> >>2013-04-09 16:59:20,240 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Joined scanner finished in 7.529044 seconds, got 1546 rows
> >> >>
> >> >>FYI
> >> >>
> >> >>On Tue, Apr 9, 2013 at 4:47 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >> >>
> >> >>> We did some tests here.
> >> >>> I ran this through the profiler against a local RegionServer and
> >> found the
> >> >>> part that causes the slowdown is a seek called here:
> >> >>>              boolean mayHaveData > >> >>>               (nextJoinedKv != null &&