Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Lots of SocketTimeoutException for gets and puts since HBase 0.92.1

Copy link to this message
Re: Lots of SocketTimeoutException for gets and puts since HBase 0.92.1

Right now (and previously with 0.90.3) we were using the default
value (10).
We are trying right now to increase to 30 to see if it is better.

Thanks for your concern

Le 16/11/12 18:13, Ted Yu a �crit :
> Vincent:
> What's the value for hbase.regionserver.handler.count ?
> I assume you keep the same value as that from 0.90.3
> Thanks
> On Fri, Nov 16, 2012 at 8:14 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:
>> Le 16/11/12 01:56, Stack a �crit :
>>   On Thu, Nov 15, 2012 at 5:21 AM, Guillaume Perrot <[EMAIL PROTECTED]>
>>> wrote:
>>>> It happens when several tables are being compacted and/or when there is
>>>> several scanners running.
>>> It happens for a particular region?  Anything you can tell about the
>>> server looking in your cluster monitoring?  Is it running hot?  What
>>> do the hbase regionserver stats in UI say?  Anything interesting about
>>> compaction queues or requests?
>> Hi, thanks for your answser Stack. I will take the lead on that thread
>> from now on.
>> It does not happens on any particular region. Actually, things get better
>> now since compactions have been performed on all tables and have been
>> stopped.
>> Nevertheless, we face a dramatic decrease of performances (especially on
>> random gets) of the overall cluster:
>> Despite the fact we double our number of region servers (from 8 to 16) and
>> despite the fact that these region server CPU load are just about 10% to
>> 30%, performances are really bad : very often an light increase of request
>> lead to a clients locked on request, very long response time. It looks like
>> a contention / deadlock somewhere in the HBase client and C code.
>>> If you look at the thread dump all handlers are occupied serving
>>> requests?  These timedout requests couldn't get into the server?
>> We will investigate on that and report to you.
>>   Before the timeouts, we observe an increasing CPU load on a single region
>>>> server and if we add region servers and wait for rebalancing, we always
>>>> have the same region server causing problems like these:
>>>> 2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.**HBaseServer: IPC
>>>> Server Responder, call
>>>> multi(org.apache.hadoop.hbase.**client.MultiAction@2c3da1aa), rpc
>>>> version=1, client version=29, methodsFingerPrint=54742778 from
>>>> <ip>:45334: output error
>>>> 2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.**HBaseServer: IPC
>>>> Server handler 3 on 60020 caught: java.nio.channels.**
>>>> ClosedChannelException
>>>> at sun.nio.ch.SocketChannelImpl.**ensureWriteOpen(**
>>>> SocketChannelImpl.java:133)
>>>> at sun.nio.ch.SocketChannelImpl.**write(SocketChannelImpl.java:**324)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.**HBaseServer.channelWrite(**
>>>> HBaseServer.java:1653)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.**HBaseServer$Responder.
>>>> processResponse(HBaseServer.**java:924)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.**HBaseServer$Responder.
>>>> doRespond(HBaseServer.java:**1003)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.**HBaseServer$Call.**sendResponseIfReady(
>>>> HBaseServer.java:409)
>>>> at
>>>> org.apache.hadoop.hbase.ipc.**HBaseServer$Handler.run(**
>>>> HBaseServer.java:1346)
>>>> With the same access patterns, we did not have this issue in HBase
>>>> 0.90.3.
>>> The above is other side of the timeout -- the client is gone.
>>> Can you explain the rising CPU?
>> No there is no explanation (no high access a a given region for exemple).
>> But this specific problem has gone when we finished compactions.
>>      Is it iowait on this box because of
>>> compactions?  Bad disk?  Always same regionserver or issue moves
>>> around?
>>> Sorry for all the questions.  0.92 should be better than 0.90
>> Our experience is currently the exact opposite : for us, 0.92 seems to be
>> times slower than the 0.90.3.
>>   generally (0.94 even better still -- can you go there?).
>> We can go to 0.94 but unfortunately, we CANNOT GO BACK (the same way we

*Vincent Barat*
* logo
*Contact info *
www.capptain.com <http://www.capptain.com>
Cell: +33 6 15 41 15 18
*Rennes Office *
Office: +33 2 99 65 69 13
10 rue Jean-Marie Duhamel
35000 Rennes
*Paris Office *
Office: +33 1 84 06 13 85
Fax: +33 9 57 72 20 18
18 rue Tronchet
75008 Paris

IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).