Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Query regarding HTable.get and timeouts


Copy link to this message
-
Re: Query regarding HTable.get and timeouts
Shrijeet Paliwal 2011-08-18, 20:21
It follows exponential back off. Each pause is longer than the last one and
all adds up close to 600.

On Thu, Aug 18, 2011 at 12:09 PM, Srikanth P. Shreenivas <
[EMAIL PROTECTED]> wrote:

> My apologies, I may not be reading the code right.
>
> You are right, it is GridGain timeout that is making the line 1255 to
> execute.
> However, the question is what would make a HTable.get() to take close to 10
> minutes to induce a timeout in GridGain task.
>
> The value of numRetries at line 1236 should be 10 (default) and if we go
> with default value of HConstants.RETRY_BACKOFF, then, sleep time added with
> all retries will be only 61 seconds, and not close to 600 seconds as the
> case in our code is.
>
>
> Regards,
> Srikanth
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:21 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Please note that line numbers I am referencing are from the file :
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:19 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Hi Stack,
>
> Thanks a lot for your reply.  It's always a comforting feeling to see very
> active community and especially your prompt replies to the queries.
>
> Yes, I am running it in as GridGain task,  so it runs it GridGain's thread
> pool.   In this case, we can imaging GridGain as something that hands off
> works to various worker threads and waits asynhronously  for it complete.  I
>  have 10 minute timeout after which GridGain would consider work as timed
> out.
>
> What we are observing is that our tasks are timeing out at 10 minute
> boundary, and delay seems to be caused by the part of the work which is
> doing HTable.get.
>
> My suspicion is that Line 1255 in HConnectionManager.java is calling the
> Thread.currentThread().interrupt(), due to which the GridGain thread kind of
> stops doing what it was meant to do, and never responsds to master node
> resulting in timeout in master.
>
> In order for line 1255 to execute, we will have to assume that all retries
> were exhausted.
> Hence, my query that what would cause a HTable.get() to get into a
> situation wherein
> HConnectionManager$HConnectionImplementation.getRegionServerWithRetries gets
> to line 1255.
>
>
> Regards,
> Srikanth
>
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [
> [EMAIL PROTECTED]]
> Sent: Friday, August 19, 2011 12:03 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Query regarding HTable.get and timeouts
>
> Is your client running inside a container of some form and could the
> container be doing the interrupting?   I've not come across
> client-side thread interrupts before.
> St.Ack
>
> On Thu, Aug 18, 2011 at 7:37 AM, Srikanth P. Shreenivas
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > We are experiencing an issue in our HBase Cluster wherein some of the
> gets are timing outs at:
> >
> > java.io.IOException: Giving up trying to get region server: thread is
> interrupted.
> >                at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
> >                at
> org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
> >
> >
> > When we look at the logs of master, zookeeper and region servers, there
> is nothing that indicates anything abnormal.
> >
> > I tried looking up below functions, but at this point could not make much
> out of it.
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java - getRegionServerWithRetries  starts at Line 1233
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java Htable.get starts at Line 611.