Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Essential column family performance


+
James Taylor 2013-04-07, 06:05
+
Ted Yu 2013-04-07, 14:44
Copy link to this message
-
Re: Essential column family performance
Yes, on 0.94.6. We have our own custom filter derived from FilterBase,
so filterIfMissing isn't the issue - the results of the scan are correct.

I can see that if the essential column family has more data compared to
the non essential column family that the results would eventually even
out. I was hoping to always be able to enable the essential column
family feature. Is there an inherent reason why performance would
degrade like this? Does it boil down to a single sequential scan versus
many seeks?

Thanks,

James

On 04/07/2013 07:44 AM, Ted Yu wrote:
> James:
> Your test was based on 0.94.6.1, right ?
>
> What Filter were you using ?
>
> If you used SingleColumnValueFilter, have you seen my comment here ?
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541229&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541229
>
> BTW the use case Max Lapan tried to address has non essential column family
> carrying considerably more data compared to essential column family.
>
> Cheers
>
>
>
> On Sat, Apr 6, 2013 at 11:05 PM, James Taylor <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>> We're doing some performance testing of the essential column family
>> feature, and we're seeing some performance degradation when comparing with
>> and without the feature enabled:
>>
>>                            Performance of scan relative
>> % of rows selected        to not enabling the feature
>> ---------------------    ------------------------------**--
>> 100%                            1.0x
>>   80%                            2.0x
>>   60%                            2.3x
>>   40%                            2.2x
>>   20%                            1.5x
>>   10%                            1.0x
>>    5%                            0.67x
>>    0%                            0.30%
>>
>> In our scenario, we have two column families. The key value from the
>> essential column family is used in the filter, while the key value from the
>> other, non essential column family is returned by the scan. Each row
>> contains values for both key values, with the values being relatively
>> narrow (less than 50 bytes). In this scenario, the only time we're seeing a
>> performance gain is when less than 10% of the rows are selected.
>>
>> Is this a reasonable test? Has anyone else measured this?
>>
>> Thanks,
>>
>> James
>>
>>
>>
>>
>>
>>
+
Ted Yu 2013-04-07, 23:03
+
Ted Yu 2013-04-07, 23:13
+
lars hofhansl 2013-04-08, 03:52
+
Ted Yu 2013-04-08, 14:49
+
Anoop John 2013-04-08, 17:10
+
James Taylor 2013-04-08, 17:38
+
Ted Yu 2013-04-08, 17:42
+
Ted Yu 2013-04-08, 18:02
+
ramkrishna vasudevan 2013-04-08, 17:51
+
Sergey Shelukhin 2013-04-08, 20:34
+
Ted Yu 2013-04-08, 21:15
+
lars hofhansl 2013-04-08, 21:41
+
James Taylor 2013-04-09, 01:53
+
lars hofhansl 2013-04-09, 23:47
+
Ted Yu 2013-04-10, 00:03
+
Ted Yu 2013-04-09, 02:51
+
Jean-Marc Spaggiari 2013-04-08, 17:19
+
Ted Yu 2013-04-08, 17:23
+
Michael Segel 2013-04-08, 18:07
+
lars hofhansl 2013-04-08, 21:29
+
Lars Hofhansl 2013-04-10, 01:17
+
Ted Yu 2013-04-10, 01:21
+
Ted Yu 2013-04-10, 04:03
+
lars hofhansl 2013-04-10, 04:16
+
Anoop Sam John 2013-04-10, 05:30
+
lars hofhansl 2013-04-10, 23:02
+
Stack 2013-04-10, 23:35
+
Ted Yu 2013-04-10, 23:05
+
Lars H 2013-04-10, 01:05