Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Essential column family performance


+
James Taylor 2013-04-07, 06:05
+
Ted Yu 2013-04-07, 14:44
+
James Taylor 2013-04-07, 18:37
+
Ted Yu 2013-04-07, 23:03
+
Ted Yu 2013-04-07, 23:13
Copy link to this message
-
Re: Essential column family performance
Looking at the joined scanner test code, it sets it up such that 1% of the rows match, which would somewhat be in line with James' results.

In my own testing a while ago I found a 100% improvement with 0% match.
-- Lars

________________________________
 From: Ted Yu <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Sunday, April 7, 2013 4:13 PM
Subject: Re: Essential column family performance
 
I have attached 5416-TestJoinedScanners-0.94.txt to HBASE-5416 for your
reference.

On my MacBook, I got the following results from the test:

2013-04-07 16:08:17,474 INFO  [main] regionserver.TestJoinedScanners(157):
Slow scanner finished in 7.973822 seconds, got 100 rows
...
2013-04-07 16:08:17,946 INFO  [main] regionserver.TestJoinedScanners(157):
Joined scanner finished in 0.47235 seconds, got 100 rows

Cheers

On Sun, Apr 7, 2013 at 4:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Looking at
> https://issues.apache.org/jira/secure/attachment/12564340/5416-0.94-v3.txt, I found that it didn't contain TestJoinedScanners which shows
> difference in scanner performance:
>
>    LOG.info((slow ? "Slow" : "Joined") + " scanner finished in " +
> Double.toString(timeSec)
>
>       + " seconds, got " + Long.toString(rows_count/2) + " rows");
>
> The test uses SingleColumnValueFilter:
>
>     SingleColumnValueFilter filter = new SingleColumnValueFilter(
>
>         cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes);
> It is possible that the custom filter you were using would exhibit
> different access pattern compared to SingleColumnValueFilter. e.g. does
> your filter utilize hint ?
> It would be easier for me and other people to reproduce the issue you
> experienced if you put your scenario in some test similar to
> TestJoinedScanners.
>
> Will take a closer look at the code Monday.
>
> Cheers
>
> On Sun, Apr 7, 2013 at 11:37 AM, James Taylor <[EMAIL PROTECTED]>wrote:
>
>> Yes, on 0.94.6. We have our own custom filter derived from FilterBase, so
>> filterIfMissing isn't the issue - the results of the scan are correct.
>>
>> I can see that if the essential column family has more data compared to
>> the non essential column family that the results would eventually even out.
>> I was hoping to always be able to enable the essential column family
>> feature. Is there an inherent reason why performance would degrade like
>> this? Does it boil down to a single sequential scan versus many seeks?
>>
>> Thanks,
>>
>> James
>>
>>
>> On 04/07/2013 07:44 AM, Ted Yu wrote:
>>
>>> James:
>>> Your test was based on 0.94.6.1, right ?
>>>
>>> What Filter were you using ?
>>>
>>> If you used SingleColumnValueFilter, have you seen my comment here ?
>>> https://issues.apache.org/**jira/browse/HBASE-5416?**
>>> focusedCommentId=13541229&**page=com.atlassian.jira.**
>>> plugin.system.issuetabpanels:**comment-tabpanel#comment-**13541229<https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541229&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541229>
>>>
>>> BTW the use case Max Lapan tried to address has non essential column
>>> family
>>> carrying considerably more data compared to essential column family.
>>>
>>> Cheers
>>>
>>>
>>>
>>> On Sat, Apr 6, 2013 at 11:05 PM, James Taylor <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>>  Hello,
>>>> We're doing some performance testing of the essential column family
>>>> feature, and we're seeing some performance degradation when comparing
>>>> with
>>>> and without the feature enabled:
>>>>
>>>>                            Performance of scan relative
>>>> % of rows selected        to not enabling the feature
>>>> ---------------------    ------------------------------****--
>>>>
>>>> 100%                            1.0x
>>>>   80%                            2.0x
>>>>   60%                            2.3x
>>>>   40%                            2.2x
>>>>   20%                            1.5x
>>>>   10%                            1.0x
>>>>    5%                            0.67x
+
Ted Yu 2013-04-08, 14:49
+
Anoop John 2013-04-08, 17:10
+
James Taylor 2013-04-08, 17:38
+
Ted Yu 2013-04-08, 17:42
+
Ted Yu 2013-04-08, 18:02
+
ramkrishna vasudevan 2013-04-08, 17:51
+
Sergey Shelukhin 2013-04-08, 20:34
+
Ted Yu 2013-04-08, 21:15
+
lars hofhansl 2013-04-08, 21:41
+
James Taylor 2013-04-09, 01:53
+
lars hofhansl 2013-04-09, 23:47
+
Ted Yu 2013-04-10, 00:03
+
Ted Yu 2013-04-09, 02:51
+
Jean-Marc Spaggiari 2013-04-08, 17:19
+
Ted Yu 2013-04-08, 17:23
+
Michael Segel 2013-04-08, 18:07
+
lars hofhansl 2013-04-08, 21:29
+
Lars Hofhansl 2013-04-10, 01:17
+
Ted Yu 2013-04-10, 01:21
+
Ted Yu 2013-04-10, 04:03
+
lars hofhansl 2013-04-10, 04:16
+
Anoop Sam John 2013-04-10, 05:30
+
lars hofhansl 2013-04-10, 23:02
+
Stack 2013-04-10, 23:35
+
Ted Yu 2013-04-10, 23:05
+
Lars H 2013-04-10, 01:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB