Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Lots of SocketTimeoutException for gets and puts since HBase 0.92.1


Copy link to this message
-
Re: Lots of SocketTimeoutException for gets and puts since HBase 0.92.1
On Thu, Nov 15, 2012 at 5:21 AM, Guillaume Perrot <[EMAIL PROTECTED]> wrote:
> It happens when several tables are being compacted and/or when there is
> several scanners running.
It happens for a particular region?  Anything you can tell about the
server looking in your cluster monitoring?  Is it running hot?  What
do the hbase regionserver stats in UI say?  Anything interesting about
compaction queues or requests?

If you look at the thread dump all handlers are occupied serving
requests?  These timedout requests couldn't get into the server?
> Before the timeouts, we observe an increasing CPU load on a single region
> server and if we add region servers and wait for rebalancing, we always
> have the same region server causing problems like these:
>
> 2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server Responder, call
> multi(org.apache.hadoop.hbase.client.MultiAction@2c3da1aa), rpc
> version=1, client version=29, methodsFingerPrint=54742778 from
> <ip>:45334: output error
> 2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020 caught: java.nio.channels.ClosedChannelException
> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.
> processResponse(HBaseServer.java:924)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.
> doRespond(HBaseServer.java:1003)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(
> HBaseServer.java:409)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
>
> With the same access patterns, we did not have this issue in HBase 0.90.3.
The above is other side of the timeout -- the client is gone.

Can you explain the rising CPU?  Is it iowait on this box because of
compactions?  Bad disk?  Always same regionserver or issue moves
around?

Sorry for all the questions.  0.92 should be better than 0.90
generally (0.94 even better still -- can you go there?).  Interesting
that these issues show up post upgrade.  I can't think of a reason why
the different versions would bring this on...

St.Ack