Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Performance penalty: Custom Filter names serialization


+
Pablo Medina 2013-08-20, 15:56
+
Ted Yu 2013-08-20, 16:11
+
Jean-Marc Spaggiari 2013-08-20, 16:15
+
Federico Gaule 2013-08-20, 19:31
+
Jean-Marc Spaggiari 2013-08-21, 19:00
+
Federico Gaule 2013-08-22, 12:52
Copy link to this message
-
Re: Performance penalty: Custom Filter names serialization
No 0.95.2 is a dev release. But if you have a dev cluster where you do your
tests before pushing to prod, you might be able to give it a try.

I definitively NOT recommend to push 0.95.2 into a production cluster.

JM

2013/8/22 Federico Gaule <[EMAIL PROTECTED]>

> Not in my case. Is 95.2.0 an stable release? I'm talking about a
> production scenario, where I'm very careful with version upgrades.
>
> Will do some benchmarking in a sandbox using > 0.94
>
> Thanks!
>
>
> On 08/21/2013 04:00 PM, Jean-Marc Spaggiari wrote:
>
>> Have you guys tried with > 0.94? Are you facing the same issue with
>> ProtoBuf?
>>
>> JM
>>
>> 2013/8/20 Federico Gaule <[EMAIL PROTECTED]>
>>
>>  Hi everyone,
>>>
>>> I'm facing the same issue as Pablo. Renaming my classes used in HBase
>>> context improved network usage more than 20%. It would be really nice to
>>> have an improvement around this.
>>>
>>>
>>>
>>>
>>> On 08/20/2013 01:15 PM, Jean-Marc Spaggiari wrote:
>>>
>>>  But even if we are using Protobuf, he is going to face the same issue,
>>>> right?
>>>>
>>>> We should have a way to send the filter once with a number to say to the
>>>> regions that this filter, moving forward, will be represented by this
>>>> number. There is some risk to re-use a number of a filter already using
>>>> it,
>>>> but I'm sure we can come with some mechanism to avoid that.
>>>>
>>>> 2013/8/20 Ted Yu <[EMAIL PROTECTED]>
>>>>
>>>>   Are you using HBase 0.92 or 0.94 ?
>>>>
>>>>> In 0.95 and later releases, HbaseObjectWritable doesn't exist. Protobuf
>>>>> is
>>>>> used for communication.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> On Tue, Aug 20, 2013 at 8:56 AM, Pablo Medina <[EMAIL PROTECTED]
>>>>>
>>>>>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm using custom filters to retrieve filtered data from HBase using
>>>>>> the
>>>>>> native api. I noticed that the class full names of those custom
>>>>>> filters
>>>>>>
>>>>>>  is
>>>>>
>>>>>  being sent as the bytes representation of the string using
>>>>>> Text.writeString(). This consumes a lot of network bandwidth in my
>>>>>> case
>>>>>>
>>>>>>  due
>>>>>
>>>>>  to using 5 custom filters per Get and issuing 1.5 million gets per
>>>>>>
>>>>>>  minute.
>>>>>
>>>>>  I took at look at the code
>>>>>>
>>>>>>  (org.apache.hadoop.hbase.io.****HbaseObjectWritable)
>>>>>
>>>>>
>>>>>  and It seems that HBase registers its known classes (Get, Put, etc...)
>>>>>>
>>>>>>  and
>>>>>
>>>>>  associates them with an Integer (CODE_TO_CLASS and CLASS_TO_CODE).
>>>>>> That
>>>>>> integer is sent instead of the full class name for those known
>>>>>> classes.
>>>>>> I
>>>>>> did a test reducing my custom filter class names to 2 or 3 letters and
>>>>>> it
>>>>>> improved my performance in 25%.
>>>>>> Is there any way to "register" my custom filter classes to behave the
>>>>>>
>>>>>>  same
>>>>>
>>>>>  as HBase's classes? If not, does it make sense to introduce a change
>>>>>> to
>>>>>>
>>>>>>  do
>>>>>
>>>>>  that? Is there any other workaround for this issue?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>
+
Pablo Medina 2013-08-20, 16:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB