Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Performance penalty: Custom Filter names serialization


Copy link to this message
-
Re: Performance penalty: Custom Filter names serialization
Jean-Marc Spaggiari 2013-08-21, 19:00
Have you guys tried with > 0.94? Are you facing the same issue with
ProtoBuf?

JM

2013/8/20 Federico Gaule <[EMAIL PROTECTED]>

> Hi everyone,
>
> I'm facing the same issue as Pablo. Renaming my classes used in HBase
> context improved network usage more than 20%. It would be really nice to
> have an improvement around this.
>
>
>
>
> On 08/20/2013 01:15 PM, Jean-Marc Spaggiari wrote:
>
>> But even if we are using Protobuf, he is going to face the same issue,
>> right?
>>
>> We should have a way to send the filter once with a number to say to the
>> regions that this filter, moving forward, will be represented by this
>> number. There is some risk to re-use a number of a filter already using
>> it,
>> but I'm sure we can come with some mechanism to avoid that.
>>
>> 2013/8/20 Ted Yu <[EMAIL PROTECTED]>
>>
>>  Are you using HBase 0.92 or 0.94 ?
>>>
>>> In 0.95 and later releases, HbaseObjectWritable doesn't exist. Protobuf
>>> is
>>> used for communication.
>>>
>>> Cheers
>>>
>>>
>>> On Tue, Aug 20, 2013 at 8:56 AM, Pablo Medina <[EMAIL PROTECTED]
>>>
>>>> wrote:
>>>> Hi all,
>>>>
>>>> I'm using custom filters to retrieve filtered data from HBase using the
>>>> native api. I noticed that the class full names of those custom filters
>>>>
>>> is
>>>
>>>> being sent as the bytes representation of the string using
>>>> Text.writeString(). This consumes a lot of network bandwidth in my case
>>>>
>>> due
>>>
>>>> to using 5 custom filters per Get and issuing 1.5 million gets per
>>>>
>>> minute.
>>>
>>>> I took at look at the code
>>>>
>>> (org.apache.hadoop.hbase.io.**HbaseObjectWritable)
>>>
>>>> and It seems that HBase registers its known classes (Get, Put, etc...)
>>>>
>>> and
>>>
>>>> associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE). That
>>>> integer is sent instead of the full class name for those known classes.
>>>> I
>>>> did a test reducing my custom filter class names to 2 or 3 letters and
>>>> it
>>>> improved my performance in 25%.
>>>> Is there any way to "register" my custom filter classes to behave the
>>>>
>>> same
>>>
>>>> as HBase's classes? If not, does it make sense to introduce a change to
>>>>
>>> do
>>>
>>>> that? Is there any other workaround for this issue?
>>>>
>>>> Thanks!
>>>>
>>>>
>