Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Essential column family performance


Copy link to this message
-
Re: Essential column family performance
Ted Yu 2013-04-08, 14:49
I made the following change in TestJoinedScanners.java:

-      int flag_percent = 1;
+      int flag_percent = 40;

The test took longer but still favors joined scanner.
I got some new results:

2013-04-08 07:46:06,959 INFO  [main] regionserver.TestJoinedScanners(157):
Slow scanner finished in 7.424388 seconds, got 2050 rows
...
2013-04-08 07:46:12,010 INFO  [main] regionserver.TestJoinedScanners(157):
Joined scanner finished in 5.05063 seconds, got 2050 rows

2013-04-08 07:46:18,358 INFO  [main] regionserver.TestJoinedScanners(157):
Slow scanner finished in 6.348517 seconds, got 2050 rows
...
2013-04-08 07:46:22,946 INFO  [main] regionserver.TestJoinedScanners(157):
Joined scanner finished in 4.587545 seconds, got 2050 rows

Looks like effectiveness of joined scanner is affected by distribution of
data.

Cheers

On Sun, Apr 7, 2013 at 8:52 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Looking at the joined scanner test code, it sets it up such that 1% of the
> rows match, which would somewhat be in line with James' results.
>
> In my own testing a while ago I found a 100% improvement with 0% match.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Sunday, April 7, 2013 4:13 PM
> Subject: Re: Essential column family performance
>
> I have attached 5416-TestJoinedScanners-0.94.txt to HBASE-5416 for your
> reference.
>
> On my MacBook, I got the following results from the test:
>
> 2013-04-07 16:08:17,474 INFO  [main] regionserver.TestJoinedScanners(157):
> Slow scanner finished in 7.973822 seconds, got 100 rows
> ...
> 2013-04-07 16:08:17,946 INFO  [main] regionserver.TestJoinedScanners(157):
> Joined scanner finished in 0.47235 seconds, got 100 rows
>
> Cheers
>
> On Sun, Apr 7, 2013 at 4:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Looking at
> >
> https://issues.apache.org/jira/secure/attachment/12564340/5416-0.94-v3.txt,
> I found that it didn't contain TestJoinedScanners which shows
> > difference in scanner performance:
> >
> >    LOG.info((slow ? "Slow" : "Joined") + " scanner finished in " +
> > Double.toString(timeSec)
> >
> >       + " seconds, got " + Long.toString(rows_count/2) + " rows");
> >
> > The test uses SingleColumnValueFilter:
> >
> >     SingleColumnValueFilter filter = new SingleColumnValueFilter(
> >
> >         cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes);
> > It is possible that the custom filter you were using would exhibit
> > different access pattern compared to SingleColumnValueFilter. e.g. does
> > your filter utilize hint ?
> > It would be easier for me and other people to reproduce the issue you
> > experienced if you put your scenario in some test similar to
> > TestJoinedScanners.
> >
> > Will take a closer look at the code Monday.
> >
> > Cheers
> >
> > On Sun, Apr 7, 2013 at 11:37 AM, James Taylor <[EMAIL PROTECTED]
> >wrote:
> >
> >> Yes, on 0.94.6. We have our own custom filter derived from FilterBase,
> so
> >> filterIfMissing isn't the issue - the results of the scan are correct.
> >>
> >> I can see that if the essential column family has more data compared to
> >> the non essential column family that the results would eventually even
> out.
> >> I was hoping to always be able to enable the essential column family
> >> feature. Is there an inherent reason why performance would degrade like
> >> this? Does it boil down to a single sequential scan versus many seeks?
> >>
> >> Thanks,
> >>
> >> James
> >>
> >>
> >> On 04/07/2013 07:44 AM, Ted Yu wrote:
> >>
> >>> James:
> >>> Your test was based on 0.94.6.1, right ?
> >>>
> >>> What Filter were you using ?
> >>>
> >>> If you used SingleColumnValueFilter, have you seen my comment here ?
> >>> https://issues.apache.org/**jira/browse/HBASE-5416?**
> >>> focusedCommentId=13541229&**page=com.atlassian.jira.**
> >>> plugin.system.issuetabpanels:**comment-tabpanel#comment-**13541229<
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541229&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541229