Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance penalty: Custom Filter names serialization


Copy link to this message
-
Performance penalty: Custom Filter names serialization
Hi all,

I'm using custom filters to retrieve filtered data from HBase using the
native api. I noticed that the class full names of those custom filters is
being sent as the bytes representation of the string using
Text.writeString(). This consumes a lot of network bandwidth in my case due
to using 5 custom filters per Get and issuing 1.5 million gets per minute.
I took at look at the code (org.apache.hadoop.hbase.io.HbaseObjectWritable)
and It seems that HBase registers its known classes (Get, Put, etc...) and
associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE). That
integer is sent instead of the full class name for those known classes. I
did a test reducing my custom filter class names to 2 or 3 letters and it
improved my performance in 25%.
Is there any way to "register" my custom filter classes to behave the same
as HBase's classes? If not, does it make sense to introduce a change to do
that? Is there any other workaround for this issue?

Thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB