|
|
-
Re: REST servers locked up on single RS malfunction.Jack Levin 2011-04-25, 20:04
thats a separate cluster, its barely getting any traffic so I don't
think queue would be an issue. We do however have very large files stored (file per row). So question is, if this is a GET that breaks things, how can we avoid it? -Jack On Mon, Apr 25, 2011 at 10:37 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Can't tell what it was because it OOME'd while reading whatever was coming in. > > Did you bump the number of handlers in that cluster too? Because you > might hit what we talked about in this jira: > https://issues.apache.org/jira/browse/HBASE-3813 > > "Chatting w/ J-D this morning, he asked if the queues hold 'data'. The > queues hold 'Calls'. Calls are the client request. They contain data. > Jack had 2500 items queued. If each item to insert was 1MB, thats 25k > * 1MB of memory that is outside of our generally accounting." > > So the higher the number of handlers the more memory could be used by > the queues. > > J-D > > On Mon, Apr 25, 2011 at 10:32 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Stack: >> >> Exception in thread "pool-1-thread-9" java.lang.OutOfMemoryError: Java >> heap space >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:120) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:959) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:927) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:503) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:297) >> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:619) >> >> Btw, is this put or read? Perhaps we are crashing on some sort of large read? >> >> -Jack >> >> On Thu, Apr 21, 2011 at 12:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >>> Shouldn't the RS just shutdown then? Because it stays half alive and >>> none of the puts succeed. Also the oome happen right after >>> flush/compaction/split... so clearly the RS was busy, and it could be >>> just a matter of hitting Heap ceiling perhaps. >>> >>> -Jack >>> >>> On Thu, Apr 21, 2011 at 12:13 AM, Stack <[EMAIL PROTECTED]> wrote: >>>> This looks like a bug. Elsewhere in the RPC you can register a >>>> handler for OOME explicitly and we have a callback up into the >>>> regionserver where we will set that the server abort or stop dependent >>>> on type of OOME we see. In this case it looks like on OOME we just >>>> throw and the then all the executors fill so no more executors >>>> available to process requests (This is my current accessment -- it >>>> could be a different one by morning). >>>> >>>> The root cause would look to be a big put. Could that be the case. >>>> >>>> On the naming, that looks to be the default naming of executor threads >>>> done by the hosting executorservice. >>>> >>>> St.Ack >>>> >>>> >>>> On Wed, Apr 20, 2011 at 10:11 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >>>>> Hello, with 0.89 HBASE, we see the following, all REST servers get >>>>> locked on trying to connect to one of our RS servers, the error in the >>>>> .out file on that Region Server looks like this: >>>>> >>>>> Exception in thread "pool-1-thread-3" java.lang.OutOfMemoryError: Java >>>>> heap space >>>>> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:120) >>>>> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:959) >>>>> at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:927) >>>>> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:503) >>>>> at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:297) >>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) |