Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance penalty: Custom Filter names serialization


Copy link to this message
-
Performance penalty: Custom Filter names serialization
Hi all,

I'm using custom filters to retrieve filtered data from HBase using the
native api. I noticed that the class full names of those custom filters is
being sent as the bytes representation of the string using
Text.writeString(). This consumes a lot of network bandwidth in my case due
to using 5 custom filters per Get and issuing 1.5 million gets per minute.
I took at look at the code (org.apache.hadoop.hbase.io.HbaseObjectWritable)
and It seems that HBase registers its known classes (Get, Put, etc...) and
associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE). That
integer is sent instead of the full class name for those known classes. I
did a test reducing my custom filter class names to 2 or 3 letters and it
improved my performance in 25%.
Is there any way to "register" my custom filter classes to behave the same
as HBase's classes? If not, does it make sense to introduce a change to do
that? Is there any other workaround for this issue?

Thanks!