-Re: REST servers locked up on single RS malfunction.
Jack Levin 2011-04-21, 07:47
Shouldn't the RS just shutdown then? Because it stays half alive and
none of the puts succeed. Also the oome happen right after
flush/compaction/split... so clearly the RS was busy, and it could be
just a matter of hitting Heap ceiling perhaps.
On Thu, Apr 21, 2011 at 12:13 AM, Stack <[EMAIL PROTECTED]> wrote:
> This looks like a bug. Elsewhere in the RPC you can register a
> handler for OOME explicitly and we have a callback up into the
> regionserver where we will set that the server abort or stop dependent
> on type of OOME we see. In this case it looks like on OOME we just
> throw and the then all the executors fill so no more executors
> available to process requests (This is my current accessment -- it
> could be a different one by morning).
> The root cause would look to be a big put. Could that be the case.
> On the naming, that looks to be the default naming of executor threads
> done by the hosting executorservice.
> On Wed, Apr 20, 2011 at 10:11 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>> Hello, with 0.89 HBASE, we see the following, all REST servers get
>> locked on trying to connect to one of our RS servers, the error in the
>> .out file on that Region Server looks like this:
>> Exception in thread "pool-1-thread-3" java.lang.OutOfMemoryError: Java
>> heap space
>> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:120)
>> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:959)
>> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:927)
>> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:503)
>> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:297)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:619)
>> Question is, how come the region server did not die after this but
>> just hogged the REST connections? And what is pool1-thread-3 actually