Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Essential column family performance


Copy link to this message
-
Re: Essential column family performance
Something I'm not getting, why not using separate tables instead of
CFs for a single table? Simply name your table tablename_cfname then
you get ride of the CF# limitation?

Or is there big pros to have CFs?

JM

2013/4/8 Anoop John <[EMAIL PROTECTED]>:
> Agree here. The effectiveness depends on what % of data satisfies the
> condition, how it is distributed across HFile blocks. We will get
> performance gain when the we will be able to skip some HFile blocks (from
> non essential CFs). Can test with different HFile block size (lower value)?
>
> -Anoop-
>
>
> On Mon, Apr 8, 2013 at 8:19 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> I made the following change in TestJoinedScanners.java:
>>
>> -      int flag_percent = 1;
>> +      int flag_percent = 40;
>>
>> The test took longer but still favors joined scanner.
>> I got some new results:
>>
>> 2013-04-08 07:46:06,959 INFO  [main] regionserver.TestJoinedScanners(157):
>> Slow scanner finished in 7.424388 seconds, got 2050 rows
>> ...
>> 2013-04-08 07:46:12,010 INFO  [main] regionserver.TestJoinedScanners(157):
>> Joined scanner finished in 5.05063 seconds, got 2050 rows
>>
>> 2013-04-08 07:46:18,358 INFO  [main] regionserver.TestJoinedScanners(157):
>> Slow scanner finished in 6.348517 seconds, got 2050 rows
>> ...
>> 2013-04-08 07:46:22,946 INFO  [main] regionserver.TestJoinedScanners(157):
>> Joined scanner finished in 4.587545 seconds, got 2050 rows
>>
>> Looks like effectiveness of joined scanner is affected by distribution of
>> data.
>>
>> Cheers
>>
>> On Sun, Apr 7, 2013 at 8:52 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>> > Looking at the joined scanner test code, it sets it up such that 1% of
>> the
>> > rows match, which would somewhat be in line with James' results.
>> >
>> > In my own testing a while ago I found a 100% improvement with 0% match.
>> >
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Ted Yu <[EMAIL PROTECTED]>
>> > To: [EMAIL PROTECTED]
>> > Sent: Sunday, April 7, 2013 4:13 PM
>> > Subject: Re: Essential column family performance
>> >
>> > I have attached 5416-TestJoinedScanners-0.94.txt to HBASE-5416 for your
>> > reference.
>> >
>> > On my MacBook, I got the following results from the test:
>> >
>> > 2013-04-07 16:08:17,474 INFO  [main]
>> regionserver.TestJoinedScanners(157):
>> > Slow scanner finished in 7.973822 seconds, got 100 rows
>> > ...
>> > 2013-04-07 16:08:17,946 INFO  [main]
>> regionserver.TestJoinedScanners(157):
>> > Joined scanner finished in 0.47235 seconds, got 100 rows
>> >
>> > Cheers
>> >
>> > On Sun, Apr 7, 2013 at 4:03 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> > > Looking at
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12564340/5416-0.94-v3.txt
>> ,
>> > I found that it didn't contain TestJoinedScanners which shows
>> > > difference in scanner performance:
>> > >
>> > >    LOG.info((slow ? "Slow" : "Joined") + " scanner finished in " +
>> > > Double.toString(timeSec)
>> > >
>> > >       + " seconds, got " + Long.toString(rows_count/2) + " rows");
>> > >
>> > > The test uses SingleColumnValueFilter:
>> > >
>> > >     SingleColumnValueFilter filter = new SingleColumnValueFilter(
>> > >
>> > >         cf_essential, col_name, CompareFilter.CompareOp.EQUAL,
>> flag_yes);
>> > > It is possible that the custom filter you were using would exhibit
>> > > different access pattern compared to SingleColumnValueFilter. e.g. does
>> > > your filter utilize hint ?
>> > > It would be easier for me and other people to reproduce the issue you
>> > > experienced if you put your scenario in some test similar to
>> > > TestJoinedScanners.
>> > >
>> > > Will take a closer look at the code Monday.
>> > >
>> > > Cheers
>> > >
>> > > On Sun, Apr 7, 2013 at 11:37 AM, James Taylor <[EMAIL PROTECTED]
>> > >wrote:
>> > >
>> > >> Yes, on 0.94.6. We have our own custom filter derived from FilterBase,
>> > so
>> > >> filterIfMissing isn't the issue - the results of the scan are correct.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB