Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance penalty: Custom Filter names serialization


Copy link to this message
-
Re: Performance penalty: Custom Filter names serialization
Not in my case. Is 95.2.0 an stable release? I'm talking about a
production scenario, where I'm very careful with version upgrades.

Will do some benchmarking in a sandbox using > 0.94

Thanks!

On 08/21/2013 04:00 PM, Jean-Marc Spaggiari wrote:
> Have you guys tried with > 0.94? Are you facing the same issue with
> ProtoBuf?
>
> JM
>
> 2013/8/20 Federico Gaule <[EMAIL PROTECTED]>
>
>> Hi everyone,
>>
>> I'm facing the same issue as Pablo. Renaming my classes used in HBase
>> context improved network usage more than 20%. It would be really nice to
>> have an improvement around this.
>>
>>
>>
>>
>> On 08/20/2013 01:15 PM, Jean-Marc Spaggiari wrote:
>>
>>> But even if we are using Protobuf, he is going to face the same issue,
>>> right?
>>>
>>> We should have a way to send the filter once with a number to say to the
>>> regions that this filter, moving forward, will be represented by this
>>> number. There is some risk to re-use a number of a filter already using
>>> it,
>>> but I'm sure we can come with some mechanism to avoid that.
>>>
>>> 2013/8/20 Ted Yu <[EMAIL PROTECTED]>
>>>
>>>   Are you using HBase 0.92 or 0.94 ?
>>>> In 0.95 and later releases, HbaseObjectWritable doesn't exist. Protobuf
>>>> is
>>>> used for communication.
>>>>
>>>> Cheers
>>>>
>>>>
>>>> On Tue, Aug 20, 2013 at 8:56 AM, Pablo Medina <[EMAIL PROTECTED]
>>>>
>>>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm using custom filters to retrieve filtered data from HBase using the
>>>>> native api. I noticed that the class full names of those custom filters
>>>>>
>>>> is
>>>>
>>>>> being sent as the bytes representation of the string using
>>>>> Text.writeString(). This consumes a lot of network bandwidth in my case
>>>>>
>>>> due
>>>>
>>>>> to using 5 custom filters per Get and issuing 1.5 million gets per
>>>>>
>>>> minute.
>>>>
>>>>> I took at look at the code
>>>>>
>>>> (org.apache.hadoop.hbase.io.**HbaseObjectWritable)
>>>>
>>>>> and It seems that HBase registers its known classes (Get, Put, etc...)
>>>>>
>>>> and
>>>>
>>>>> associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE). That
>>>>> integer is sent instead of the full class name for those known classes.
>>>>> I
>>>>> did a test reducing my custom filter class names to 2 or 3 letters and
>>>>> it
>>>>> improved my performance in 25%.
>>>>> Is there any way to "register" my custom filter classes to behave the
>>>>>
>>>> same
>>>>
>>>>> as HBase's classes? If not, does it make sense to introduce a change to
>>>>>
>>>> do
>>>>
>>>>> that? Is there any other workaround for this issue?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>