Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Essential column family performance


+
James Taylor 2013-04-07, 06:05
+
Ted Yu 2013-04-07, 14:44
+
James Taylor 2013-04-07, 18:37
+
Ted Yu 2013-04-07, 23:03
+
Ted Yu 2013-04-07, 23:13
+
lars hofhansl 2013-04-08, 03:52
+
Ted Yu 2013-04-08, 14:49
+
Anoop John 2013-04-08, 17:10
+
James Taylor 2013-04-08, 17:38
+
Ted Yu 2013-04-08, 17:42
+
Ted Yu 2013-04-08, 18:02
+
ramkrishna vasudevan 2013-04-08, 17:51
+
Sergey Shelukhin 2013-04-08, 20:34
+
Ted Yu 2013-04-08, 21:15
+
lars hofhansl 2013-04-08, 21:41
+
James Taylor 2013-04-09, 01:53
+
lars hofhansl 2013-04-09, 23:47
+
Ted Yu 2013-04-10, 00:03
+
Ted Yu 2013-04-09, 02:51
+
Jean-Marc Spaggiari 2013-04-08, 17:19
+
Ted Yu 2013-04-08, 17:23
+
Michael Segel 2013-04-08, 18:07
+
lars hofhansl 2013-04-08, 21:29
+
Lars Hofhansl 2013-04-10, 01:17
+
Ted Yu 2013-04-10, 01:21
+
Ted Yu 2013-04-10, 04:03
+
lars hofhansl 2013-04-10, 04:16
+
Anoop Sam John 2013-04-10, 05:30
+
lars hofhansl 2013-04-10, 23:02
Copy link to this message
-
Re: Essential column family performance
Turn it on by default in trunk/0.95 I'd say.
St.Ack
On Wed, Apr 10, 2013 at 4:02 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Fix is committed and will be in 0.94.7.
>
> I guess we should have a discussion at some point on whether we should
> always switch this feature on (it is disabled by default), as we now can no
> longer find any case where enabling it is slower.
>
> -- Lars
>
>
>
> ________________________________
>  From: Anoop Sam John <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Sent: Tuesday, April 9, 2013 10:30 PM
> Subject: RE: Essential column family performance
>
> Good finding Lars & team  :)
>
> -Anoop-
> ________________________________________
> From: lars hofhansl [[EMAIL PROTECTED]]
> Sent: Wednesday, April 10, 2013 9:46 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Essential column family performance
>
> That part did not show up in the profiling session.
> It was just the unnecessary seek that slowed it all down.
>
> -- Lars
>
>
>
> ________________________________
> From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, April 9, 2013 9:03 PM
> Subject: Re: Essential column family performance
>
> Looking at populateFromJoinedHeap():
>
>       KeyValue kv = populateResult(results, this.joinedHeap, limit,
>
>           joinedContinuationRow.getBuffer(), joinedContinuationRow
> .getRowOffset(),
>
>           joinedContinuationRow.getRowLength(), metric);
>
> ...
>
>       Collections.sort(results, comparator);
>
> Arrays.mergeSort() is used in the Collections.sort() call.
>
> There seems to be some optimization we can do above: we can record the size
> of results before calling populateResult(). Upon return, we can merge the
> two segments without resorting to Arrays.mergeSort() which is recursive.
>
>
> On Tue, Apr 9, 2013 at 6:21 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. with only 10000 rows that would all fit in the memstore.
> >
> > This aspect should be enhanced in the test.
> >
> > Cheers
> >
> > On Tue, Apr 9, 2013 at 6:17 PM, Lars Hofhansl <[EMAIL PROTECTED]>
> wrote:
> >
> >> Also the unittest tests with only 10000 rows that would all fit in the
> >> memstore. Seek vs reseek should make little difference for the memstore.
> >>
> >> We tested with 1m and 10m rows, and flushed the memstore  and compacted
> >> the store.
> >>
> >> Will do some more verification later tonight.
> >>
> >> -- Lars
> >>
> >>
> >> Lars H <[EMAIL PROTECTED]> wrote:
> >>
> >> >Your slow scanner performance seems to vary as well. How come? Slow is
> >> with the feature off.
> >> >
> >> >I don't how reseek can be slower than seek in any scenario.
> >> >
> >> >-- Lars
> >> >
> >> >Ted Yu <[EMAIL PROTECTED]> schrieb:
> >> >
> >> >>I tried using reseek() as suggested, along with my patch from
> >> HBASE-8306 (30%
> >> >>selection rate, random distribution and FAST_DIFF encoding on both
> >> column
> >> >>families).
> >> >>I got uneven results:
> >> >>
> >> >>2013-04-09 16:59:01,324 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Slow scanner finished in 7.529083 seconds, got 1546 rows
> >> >>
> >> >>2013-04-09 16:59:06,760 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Joined scanner finished in 5.43579 seconds, got 1546 rows
> >> >>...
> >> >>2013-04-09 16:59:12,711 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Slow scanner finished in 5.95016 seconds, got 1546 rows
> >> >>
> >> >>2013-04-09 16:59:20,240 INFO  [main]
> >> regionserver.TestJoinedScanners(167):
> >> >>Joined scanner finished in 7.529044 seconds, got 1546 rows
> >> >>
> >> >>FYI
> >> >>
> >> >>On Tue, Apr 9, 2013 at 4:47 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >> >>
> >> >>> We did some tests here.
> >> >>> I ran this through the profiler against a local RegionServer and
> >> found the
> >> >>> part that causes the slowdown is a seek called here:
> >> >>>              boolean mayHaveData > >> >>>               (nextJoinedKv != null &&
+
Ted Yu 2013-04-10, 23:05
+
Lars H 2013-04-10, 01:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB