Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance penalty: Custom Filter names serialization


Copy link to this message
-
Re: Performance penalty: Custom Filter names serialization
Have you guys tried with > 0.94? Are you facing the same issue with
ProtoBuf?

JM

2013/8/20 Federico Gaule <[EMAIL PROTECTED]>

> Hi everyone,
>
> I'm facing the same issue as Pablo. Renaming my classes used in HBase
> context improved network usage more than 20%. It would be really nice to
> have an improvement around this.
>
>
>
>
> On 08/20/2013 01:15 PM, Jean-Marc Spaggiari wrote:
>
>> But even if we are using Protobuf, he is going to face the same issue,
>> right?
>>
>> We should have a way to send the filter once with a number to say to the
>> regions that this filter, moving forward, will be represented by this
>> number. There is some risk to re-use a number of a filter already using
>> it,
>> but I'm sure we can come with some mechanism to avoid that.
>>
>> 2013/8/20 Ted Yu <[EMAIL PROTECTED]>
>>
>>  Are you using HBase 0.92 or 0.94 ?
>>>
>>> In 0.95 and later releases, HbaseObjectWritable doesn't exist. Protobuf
>>> is
>>> used for communication.
>>>
>>> Cheers
>>>
>>>
>>> On Tue, Aug 20, 2013 at 8:56 AM, Pablo Medina <[EMAIL PROTECTED]
>>>
>>>> wrote:
>>>> Hi all,
>>>>
>>>> I'm using custom filters to retrieve filtered data from HBase using the
>>>> native api. I noticed that the class full names of those custom filters
>>>>
>>> is
>>>
>>>> being sent as the bytes representation of the string using
>>>> Text.writeString(). This consumes a lot of network bandwidth in my case
>>>>
>>> due
>>>
>>>> to using 5 custom filters per Get and issuing 1.5 million gets per
>>>>
>>> minute.
>>>
>>>> I took at look at the code
>>>>
>>> (org.apache.hadoop.hbase.io.**HbaseObjectWritable)
>>>
>>>> and It seems that HBase registers its known classes (Get, Put, etc...)
>>>>
>>> and
>>>
>>>> associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE). That
>>>> integer is sent instead of the full class name for those known classes.
>>>> I
>>>> did a test reducing my custom filter class names to 2 or 3 letters and
>>>> it
>>>> improved my performance in 25%.
>>>> Is there any way to "register" my custom filter classes to behave the
>>>>
>>> same
>>>
>>>> as HBase's classes? If not, does it make sense to introduce a change to
>>>>
>>> do
>>>
>>>> that? Is there any other workaround for this issue?
>>>>
>>>> Thanks!
>>>>
>>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB