Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Performance penalty: Custom Filter names serialization


+
Pablo Medina 2013-08-20, 15:56
+
Ted Yu 2013-08-20, 16:11
+
Jean-Marc Spaggiari 2013-08-20, 16:15
+
Federico Gaule 2013-08-20, 19:31
+
Jean-Marc Spaggiari 2013-08-21, 19:00
+
Federico Gaule 2013-08-22, 12:52
Copy link to this message
-
Re: Performance penalty: Custom Filter names serialization
Jean-Marc Spaggiari 2013-08-22, 12:54
No 0.95.2 is a dev release. But if you have a dev cluster where you do your
tests before pushing to prod, you might be able to give it a try.

I definitively NOT recommend to push 0.95.2 into a production cluster.

JM

2013/8/22 Federico Gaule <[EMAIL PROTECTED]>

> Not in my case. Is 95.2.0 an stable release? I'm talking about a
> production scenario, where I'm very careful with version upgrades.
>
> Will do some benchmarking in a sandbox using > 0.94
>
> Thanks!
>
>
> On 08/21/2013 04:00 PM, Jean-Marc Spaggiari wrote:
>
>> Have you guys tried with > 0.94? Are you facing the same issue with
>> ProtoBuf?
>>
>> JM
>>
>> 2013/8/20 Federico Gaule <[EMAIL PROTECTED]>
>>
>>  Hi everyone,
>>>
>>> I'm facing the same issue as Pablo. Renaming my classes used in HBase
>>> context improved network usage more than 20%. It would be really nice to
>>> have an improvement around this.
>>>
>>>
>>>
>>>
>>> On 08/20/2013 01:15 PM, Jean-Marc Spaggiari wrote:
>>>
>>>  But even if we are using Protobuf, he is going to face the same issue,
>>>> right?
>>>>
>>>> We should have a way to send the filter once with a number to say to the
>>>> regions that this filter, moving forward, will be represented by this
>>>> number. There is some risk to re-use a number of a filter already using
>>>> it,
>>>> but I'm sure we can come with some mechanism to avoid that.
>>>>
>>>> 2013/8/20 Ted Yu <[EMAIL PROTECTED]>
>>>>
>>>>   Are you using HBase 0.92 or 0.94 ?
>>>>
>>>>> In 0.95 and later releases, HbaseObjectWritable doesn't exist. Protobuf
>>>>> is
>>>>> used for communication.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> On Tue, Aug 20, 2013 at 8:56 AM, Pablo Medina <[EMAIL PROTECTED]
>>>>>
>>>>>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm using custom filters to retrieve filtered data from HBase using
>>>>>> the
>>>>>> native api. I noticed that the class full names of those custom
>>>>>> filters
>>>>>>
>>>>>>  is
>>>>>
>>>>>  being sent as the bytes representation of the string using
>>>>>> Text.writeString(). This consumes a lot of network bandwidth in my
>>>>>> case
>>>>>>
>>>>>>  due
>>>>>
>>>>>  to using 5 custom filters per Get and issuing 1.5 million gets per
>>>>>>
>>>>>>  minute.
>>>>>
>>>>>  I took at look at the code
>>>>>>
>>>>>>  (org.apache.hadoop.hbase.io.****HbaseObjectWritable)
>>>>>
>>>>>
>>>>>  and It seems that HBase registers its known classes (Get, Put, etc...)
>>>>>>
>>>>>>  and
>>>>>
>>>>>  associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE).
>>>>>> That
>>>>>> integer is sent instead of the full class name for those known
>>>>>> classes.
>>>>>> I
>>>>>> did a test reducing my custom filter class names to 2 or 3 letters and
>>>>>> it
>>>>>> improved my performance in 25%.
>>>>>> Is there any way to "register" my custom filter classes to behave the
>>>>>>
>>>>>>  same
>>>>>
>>>>>  as HBase's classes? If not, does it make sense to introduce a change
>>>>>> to
>>>>>>
>>>>>>  do
>>>>>
>>>>>  that? Is there any other workaround for this issue?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>
+
Pablo Medina 2013-08-20, 16:15