Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBaseClient.call() hang


Copy link to this message
-
Re: HBaseClient.call() hang
Ted Yu 2012-12-15, 05:49
Bryan:

bq. My only thought would be to forego using filters
Please keep using filters.

I and Sergey are working on HBASE-5416: Improve performance of scans with
some kind of filters
This feature allows you to specify one column family as being essential.
The other column family is only returned to client when essential column
family matches. I wonder if this may be of help to you.

You mentioned regionserver going down or being busy. I assume it was not
often that regionserver(s) went down. For busy region server, did you try
jstack'ing regionserver process ?

Thanks

On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:

> I have encountered a problem with HBaseClient.call() hanging. This occurs
> when one of my regionservers goes down while performing a table scan.
>
> What exacerbates this problem is that the scan I am performing uses
> filters, and the region size of the table is large (4gb). Because of this,
> it can take several minutes for a row to be returned when calling
> scanner.next(). Apparently there is no keep alive message being sent back
> to the scanner while the region server is busy, so I had to increase the
> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
> call will timeout waiting for the regionserver to send something back.
>
> The result is that this HBaseClient.call() hang is made much worse,
> because it won't time out for 60 minutes.
>
> I have a couple of questions:
>
> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
> that call.wait() is not using any timeout so it will wait indefinitely
> until interrupted externally
>
> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
> very large number? My only thought would be to forego using filters and do
> the filtering client side, which seems pretty inefficient
>
> Here is a stack dump of the thread that was hung:
>
> Thread 10609: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>  -
> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
> java.net.InetSocketAddress, java.lang.Class,
> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
> frame)
>  -
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
> (Interpreted frame)
>  - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
> (Interpreted frame)
>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
> (Interpreted frame)
>  -
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
> @bci=36, line=1325 (Interpreted frame)
>  - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
> line=1299 (Compiled frame)
>  - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
> @bci=41, line=150 (Interpreted frame)
>  - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
> @bci=4, line=142 (Interpreted frame)
>  - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
> @bci=4, line=458 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
> line=76 (Interpreted frame)
>  -
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
> @bci=4, line=85 (Interpreted frame)
>  -
> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
> @bci=6, line=139 (Interpreted frame)
>  -
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
> frame)