Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBaseClient.call() hang


Copy link to this message
-
Re: HBaseClient.call() hang
Bryan Keller 2012-12-15, 05:29
Forgot to mention that. It's version 0.92.1 (Cloudera CDH4.1.1), running on CentOS 6 64 bit, Java 1.6.0_31

On Dec 14, 2012, at 5:31 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Hey Bryan,
>
>
> which version of HBase it this?
>
> -- Lars
>
>
>
> ________________________________
> From: Bryan Keller <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Friday, December 14, 2012 2:59 PM
> Subject: HBaseClient.call() hang
>
> I have encountered a problem with HBaseClient.call() hanging. This occurs when one of my regionservers goes down while performing a table scan.
>
> What exacerbates this problem is that the scan I am performing uses filters, and the region size of the table is large (4gb). Because of this, it can take several minutes for a row to be returned when calling scanner.next(). Apparently there is no keep alive message being sent back to the scanner while the region server is busy, so I had to increase the hbase.rpc.timeout value to a large number (60 min), otherwise the next() call will timeout waiting for the regionserver to send something back.
>
> The result is that this HBaseClient.call() hang is made much worse, because it won't time out for 60 minutes.
>
> I have a couple of questions:
>
> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed that call.wait() is not using any timeout so it will wait indefinitely until interrupted externally
>
> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a very large number? My only thought would be to forego using filters and do the filtering client side, which seems pretty inefficient
>
> Here is a stack dump of the thread that was hung:
>
> Thread 10609: (state = BLOCKED)
> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
> - org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, java.net.InetSocketAddress, java.lang.Class, org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted frame)
> - org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 (Interpreted frame)
> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 (Interpreted frame)
> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 (Interpreted frame)
> - org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) @bci=36, line=1325 (Interpreted frame)
> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, line=1299 (Compiled frame)
> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() @bci=41, line=150 (Interpreted frame)
> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() @bci=4, line=142 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=4, line=458 (Interpreted frame)
> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=76 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=85 (Interpreted frame)
> - org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) @bci=6, line=139 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 (Interpreted frame)
> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame)
> - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)