Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Essential column family performance

Copy link to this message
Essential column family performance
We're doing some performance testing of the essential column family
feature, and we're seeing some performance degradation when comparing
with and without the feature enabled:

                           Performance of scan relative
% of rows selected        to not enabling the feature
---------------------    --------------------------------
100%                            1.0x
  80%                            2.0x
  60%                            2.3x
  40%                            2.2x
  20%                            1.5x
  10%                            1.0x
   5%                            0.67x
   0%                            0.30%

In our scenario, we have two column families. The key value from the
essential column family is used in the filter, while the key value from
the other, non essential column family is returned by the scan. Each row
contains values for both key values, with the values being relatively
narrow (less than 50 bytes). In this scenario, the only time we're
seeing a performance gain is when less than 10% of the rows are selected.

Is this a reasonable test? Has anyone else measured this?