Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Query regarding HTable.get and timeouts


Copy link to this message
-
Re: Query regarding HTable.get and timeouts
It follows exponential back off. Each pause is longer than the last one and
all adds up close to 600.

On Thu, Aug 18, 2011 at 12:09 PM, Srikanth P. Shreenivas <
[EMAIL PROTECTED]> wrote:

> My apologies, I may not be reading the code right.
>
> You are right, it is GridGain timeout that is making the line 1255 to
> execute.
> However, the question is what would make a HTable.get() to take close to 10
> minutes to induce a timeout in GridGain task.
>
> The value of numRetries at line 1236 should be 10 (default) and if we go
> with default value of HConstants.RETRY_BACKOFF, then, sleep time added with
> all retries will be only 61 seconds, and not close to 600 seconds as the
> case in our code is.
>
>
> Regards,
> Srikanth
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:21 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Please note that line numbers I am referencing are from the file :
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:19 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Hi Stack,
>
> Thanks a lot for your reply.  It's always a comforting feeling to see very
> active community and especially your prompt replies to the queries.
>
> Yes, I am running it in as GridGain task,  so it runs it GridGain's thread
> pool.   In this case, we can imaging GridGain as something that hands off
> works to various worker threads and waits asynhronously  for it complete.  I
>  have 10 minute timeout after which GridGain would consider work as timed
> out.
>
> What we are observing is that our tasks are timeing out at 10 minute
> boundary, and delay seems to be caused by the part of the work which is
> doing HTable.get.
>
> My suspicion is that Line 1255 in HConnectionManager.java is calling the
> Thread.currentThread().interrupt(), due to which the GridGain thread kind of
> stops doing what it was meant to do, and never responsds to master node
> resulting in timeout in master.
>
> In order for line 1255 to execute, we will have to assume that all retries
> were exhausted.
> Hence, my query that what would cause a HTable.get() to get into a
> situation wherein
> HConnectionManager$HConnectionImplementation.getRegionServerWithRetries gets
> to line 1255.
>
>
> Regards,
> Srikanth
>
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [
> [EMAIL PROTECTED]]
> Sent: Friday, August 19, 2011 12:03 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Query regarding HTable.get and timeouts
>
> Is your client running inside a container of some form and could the
> container be doing the interrupting?   I've not come across
> client-side thread interrupts before.
> St.Ack
>
> On Thu, Aug 18, 2011 at 7:37 AM, Srikanth P. Shreenivas
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > We are experiencing an issue in our HBase Cluster wherein some of the
> gets are timing outs at:
> >
> > java.io.IOException: Giving up trying to get region server: thread is
> interrupted.
> >                at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
> >                at
> org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
> >
> >
> > When we look at the logs of master, zookeeper and region servers, there
> is nothing that indicates anything abnormal.
> >
> > I tried looking up below functions, but at this point could not make much
> out of it.
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java - getRegionServerWithRetries  starts at Line 1233
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java Htable.get starts at Line 611.