Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> High cpu usage on a region server


Copy link to this message
-
Re: High cpu usage on a region server
Not that I am aware of. Reduce the HFile block size will lessen this problem (but then cause other issues).

It's just a fix to the RegexStringFilter. You can just recompile that and deploy it to the RegionServers (need to make it's in the class path before the HBase jars).
Probably easier to roll a new release. It's a shame we did not see this earlier.
-- Lars

________________________________
 From: OpenSource Dev <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Thursday, September 12, 2013 9:52 AM
Subject: Re: High cpu usage on a region server
 

Thanks Lars.

Are there any other workarounds for this issue until we get the fix ?
If not we might have to do the patch and rollout custom pkg.

On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> Yep... Very likely HBASE-9428:
>
> 8 threads:
>    java.lang.Thread.State: RUNNABLE
>         at java.util.Arrays.copyOf(Arrays.java:2786)
>         at java.lang.StringCoding.decode(StringCoding.java:178)
>         at java.lang.String.<init>(String.java:483)
>         at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
>         ...
>
> 4 threads:
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
>         at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
>         at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
>         at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
>         at java.lang.StringCoding.decode(StringCoding.java:179)
>         at java.lang.String.<init>(String.java:483)
>         at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
>
> It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect)
> The fix is in 0.94.12, which is in RC right now, probably to be released early next week.
>
> -- Lars
>
>
>
> ________________________________
>  From: OpenSource Dev <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, September 12, 2013 8:15 AM
> Subject: Re: High cpu usage on a region server
>
>
> A server started getting busy last night, but this time it took ~5 hrs
> to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
> But this is still very high compared to other servers that are running
> under ~25% cpu usage. Only change that I made yesterday was the
> addition of "-XX:+UseParNewGC" to hbase startup command.
>
> http://pastebin.com/VRmujgyH
>
> On Wed, Sep 11, 2013 at 2:28 PM, Stack <[EMAIL PROTECTED]> wrote:
>> Can you thread dump the busy server and pastebin it?
>> Thanks,
>> St.Ack
>>
>>
>> On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev <[EMAIL PROTECTED]>wrote:
>>
>>> Hi,
>>>
>>> I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
>>> issues with writes/puts. System is handles upto 800k puts per seconds
>>> without issue. On average we do 250k puts per second.
>>>
>>> I am having the problem with Reads, I've also isolated where the
>>> problem is but not been able to find the root cause.
>>>
>>> I have 16 machines running hbase-region server, each has ~35 regions.
>>> Once in a while cpu goes flatout 80% in 1 region server. These are the
>>> things i've noticed in ganglia:
>>>
>>> hbase.regionserver.request - evenly distributed. Not seeing any spikes
>>> on the busy server
>>> hbase.regionserver.blockCacheSize - between 500MB and 1000MB
>>> hbase.regionserver.compactionQueueSize - avg 2 or less
>>> hbase.regionserver.blockCacheHitRatio - 30% on busy node, >60% on other
>>> nodes
>>>
>>>
>>> JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC
>>>
>>> I've noticed the system load moves to a different region, sometimes
>>> within a minute, if the busy region is restarted.
>>>
>>> Any suggestion what could be causing the load and/or what other