|
|
-
RS not processing any requests
Nathaniel Cook 2012-09-05, 21:58
We are experiencing a problem where RS are locking up and not processing any requests. Restarting the RS will fix the problem and operations will continue as normal. We are experiencing this issue under load and on two different clusters. We are importing existing data via the hbase mapreduce import job to a cluster (h1) and then replicating it to a second cluster (h2). Both the clusters h1 and h2 are experiencing the same problem. Here are some of the symptoms and logs to go with it. When a RS locks up we find a table with one region owned by the RS. We attempt to scan the table (with the hbase shell) since the table has only one region on the locked RS it hangs forever until the RS is restarted. Additionally the HMaster gets a SocketTimeoutException when communicating with the RS on port 60020. Log for HMaster: http://pastebin.com/7yMWWNNRWe ran a jstack on the both the RS process and the hbase shell process trying to do the scan. Jstack log for RS: http://pastebin.com/9Y9t5EREJstack log for scan (hbase shell): http://pastebin.com/YVTbNDu7We don't see any errors in the region server logs. We have not been able to figure out what is causing the RS to lock up. We have checked open file limits, socket limits and basic network connectivity between the machines. We are well under the limits and can create new connections between machines unrelated to HBase; so the problem is specific to talking to region servers and not network connectivity. Another reason we believe its not a network connectivity issue is that the RS are able to keep their heartbeats with ZooKeeper. We have also checked logs on NameNodes and DataNodes and are not seeing any issues. We are running HBase version 0.92 (cdh4) -- -Nathaniel Cook
+
Nathaniel Cook 2012-09-05, 21:58
-
Re: RS not processing any requests
Stack 2012-09-05, 22:39
On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: > We ran a jstack on the both the RS process and the hbase shell process > trying to do the scan. > > Jstack log for RS: > http://pastebin.com/9Y9t5ERE> What JVM (I don't know what (20.10-b01 mixed mode) is). I see a bunch of this: "PRI IPC Server handler 5 on 60020" daemon prio=10 tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry [0x000000004ab0f000] java.lang.Thread.State: BLOCKED (on object monitor) at ..... But when I go to look for other instances of the object monitor, I don't find any. I see this for each instance of BLOCKED (Or at least, the two or three I checked). Whats your OS? St.Ack
+
Stack 2012-09-05, 22:39
-
Re: RS not processing any requests
Jeff Whiting 2012-09-05, 22:51
I work with Nathaniel and can answer those questions. We are using Sun's jvm. $ java -version java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) We also tried one node on a newer version but saw the same thing... $ java -version java version "1.6.0_35" Java(TM) SE Runtime Environment (build 1.6.0_35-b10) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) We are running the lastest version of CentOS 5. It looks like just the "PRI IPC Server handler" are blocked but the "IPC Server handler" are not. What is the difference between the PRI and non-PRI handlers? I'm not to adept at reading jstack but it seems like they are trying to lock 0x000000058f3fad08 which is held by "PRI IPC Server handler 4 on 60020" daemon prio=10 tid=0x00002aaac0e52000 nid=0x92e in Object.wait() [0x000000004aa0e000] Thanks, ~Jeff On 9/5/2012 4:39 PM, Stack wrote: > On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >> We ran a jstack on the both the RS process and the hbase shell process >> trying to do the scan. >> >> Jstack log for RS: >> http://pastebin.com/9Y9t5ERE>> > > What JVM (I don't know what (20.10-b01 mixed mode) is). > > I see a bunch of this: > > "PRI IPC Server handler 5 on 60020" daemon prio=10 > tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry > [0x000000004ab0f000] > java.lang.Thread.State: BLOCKED (on object monitor) > at ..... > > But when I go to look for other instances of the object monitor, I > don't find any. I see this for each instance of BLOCKED (Or at least, > the two or three I checked). > > Whats your OS? > > St.Ack -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-05, 22:51
-
Re: RS not processing any requests
Himanshu Vashishtha 2012-09-05, 22:47
Your RS priority handlers are blocked on meta lookup, so it becomes unresponsive. Looks like you hitting https://issues.apache.org/jira/browse/HBASE-6165You running HBase replication? just confirming. Himanshu On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >> We ran a jstack on the both the RS process and the hbase shell process >> trying to do the scan. >> >> Jstack log for RS: >> http://pastebin.com/9Y9t5ERE>> > > > What JVM (I don't know what (20.10-b01 mixed mode) is). > > I see a bunch of this: > > "PRI IPC Server handler 5 on 60020" daemon prio=10 > tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry > [0x000000004ab0f000] > java.lang.Thread.State: BLOCKED (on object monitor) > at ..... > > But when I go to look for other instances of the object monitor, I > don't find any. I see this for each instance of BLOCKED (Or at least, > the two or three I checked). > > Whats your OS? > > St.Ack
+
Himanshu Vashishtha 2012-09-05, 22:47
-
Re: RS not processing any requests
Jeff Whiting 2012-09-05, 23:18
It looks like that is problem we are having. We are on 0.92 so we don't get the patch. But one solution seems to be increasing the privileged handlers. How do we increase the number of privilege handlers? ~Jeff On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: > Your RS priority handlers are blocked on meta lookup, so it becomes > unresponsive. Looks like you hitting > https://issues.apache.org/jira/browse/HBASE-6165> You running HBase replication? just confirming. > > Himanshu > > On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >>> We ran a jstack on the both the RS process and the hbase shell process >>> trying to do the scan. >>> >>> Jstack log for RS: >>> http://pastebin.com/9Y9t5ERE>>> >> >> What JVM (I don't know what (20.10-b01 mixed mode) is). >> >> I see a bunch of this: >> >> "PRI IPC Server handler 5 on 60020" daemon prio=10 >> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >> [0x000000004ab0f000] >> java.lang.Thread.State: BLOCKED (on object monitor) >> at ..... >> >> But when I go to look for other instances of the object monitor, I >> don't find any. I see this for each instance of BLOCKED (Or at least, >> the two or three I checked). >> >> Whats your OS? >> >> St.Ack -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-05, 23:18
-
Re: RS not processing any requests
Himanshu Vashishtha 2012-09-05, 23:23
Number of PRI handlers are governed by "hbase.regionserver.metahandler.count"; default is 10. Increasing their number will not solve it, but will delay its occurring (i don't know about your load etc). Another related jira is HBase-6550. Some more context for your use case: http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvementsOn Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > It looks like that is problem we are having. We are on 0.92 so we don't get > the patch. But one solution seems to be increasing the privileged handlers. > How do we increase the number of privilege handlers? > > > ~Jeff > > On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: >> >> Your RS priority handlers are blocked on meta lookup, so it becomes >> unresponsive. Looks like you hitting >> https://issues.apache.org/jira/browse/HBASE-6165>> You running HBase replication? just confirming. >> >> Himanshu >> >> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >>> >>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> We ran a jstack on the both the RS process and the hbase shell process >>>> trying to do the scan. >>>> >>>> Jstack log for RS: >>>> http://pastebin.com/9Y9t5ERE>>>> >>> >>> What JVM (I don't know what (20.10-b01 mixed mode) is). >>> >>> I see a bunch of this: >>> >>> "PRI IPC Server handler 5 on 60020" daemon prio=10 >>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >>> [0x000000004ab0f000] >>> java.lang.Thread.State: BLOCKED (on object monitor) >>> at ..... >>> >>> But when I go to look for other instances of the object monitor, I >>> don't find any. I see this for each instance of BLOCKED (Or at least, >>> the two or three I checked). >>> >>> Whats your OS? >>> >>> St.Ack > > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] > > >
+
Himanshu Vashishtha 2012-09-05, 23:23
-
Re: RS not processing any requests
Jeff Whiting 2012-09-05, 23:53
hmm. So if we are on 0.92 what suggestion would you have to prevent the problem? ~Jeff On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: > Number of PRI handlers are governed by > "hbase.regionserver.metahandler.count"; default is 10. > > Increasing their number will not solve it, but will delay its > occurring (i don't know about your load etc). > > Another related jira is HBase-6550. > > Some more context for your use case: > http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvements> > > On Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >> It looks like that is problem we are having. We are on 0.92 so we don't get >> the patch. But one solution seems to be increasing the privileged handlers. >> How do we increase the number of privilege handlers? >> >> >> ~Jeff >> >> On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: >>> Your RS priority handlers are blocked on meta lookup, so it becomes >>> unresponsive. Looks like you hitting >>> https://issues.apache.org/jira/browse/HBASE-6165>>> You running HBase replication? just confirming. >>> >>> Himanshu >>> >>> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >>>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> >>>> wrote: >>>>> We ran a jstack on the both the RS process and the hbase shell process >>>>> trying to do the scan. >>>>> >>>>> Jstack log for RS: >>>>> http://pastebin.com/9Y9t5ERE>>>>> >>>> What JVM (I don't know what (20.10-b01 mixed mode) is). >>>> >>>> I see a bunch of this: >>>> >>>> "PRI IPC Server handler 5 on 60020" daemon prio=10 >>>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >>>> [0x000000004ab0f000] >>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>> at ..... >>>> >>>> But when I go to look for other instances of the object monitor, I >>>> don't find any. I see this for each instance of BLOCKED (Or at least, >>>> the two or three I checked). >>>> >>>> Whats your OS? >>>> >>>> St.Ack >> >> -- >> Jeff Whiting >> Qualtrics Senior Software Engineer >> [EMAIL PROTECTED] >> >> >> -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-05, 23:53
-
Re: RS not processing any requests
Himanshu Vashishtha 2012-09-06, 00:09
It usually happens in a long running setup (at least for me). Can you throttle your load? Replication is evolving; I'd say update if you can (or backport the jiras?). Himanshu On Wed, Sep 5, 2012 at 5:53 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > hmm. So if we are on 0.92 what suggestion would you have to prevent the > problem? > > ~Jeff > > > On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: >> >> Number of PRI handlers are governed by >> "hbase.regionserver.metahandler.count"; default is 10. >> >> Increasing their number will not solve it, but will delay its >> occurring (i don't know about your load etc). >> >> Another related jira is HBase-6550. >> >> Some more context for your use case: >> >> http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvements>> >> >> On Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >>> >>> It looks like that is problem we are having. We are on 0.92 so we don't >>> get >>> the patch. But one solution seems to be increasing the privileged >>> handlers. >>> How do we increase the number of privilege handlers? >>> >>> >>> ~Jeff >>> >>> On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: >>>> >>>> Your RS priority handlers are blocked on meta lookup, so it becomes >>>> unresponsive. Looks like you hitting >>>> https://issues.apache.org/jira/browse/HBASE-6165>>>> You running HBase replication? just confirming. >>>> >>>> Himanshu >>>> >>>> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >>>>> >>>>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook >>>>> <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> >>>>>> We ran a jstack on the both the RS process and the hbase shell process >>>>>> trying to do the scan. >>>>>> >>>>>> Jstack log for RS: >>>>>> http://pastebin.com/9Y9t5ERE>>>>>> >>>>> What JVM (I don't know what (20.10-b01 mixed mode) is). >>>>> >>>>> I see a bunch of this: >>>>> >>>>> "PRI IPC Server handler 5 on 60020" daemon prio=10 >>>>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >>>>> [0x000000004ab0f000] >>>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>>> at ..... >>>>> >>>>> But when I go to look for other instances of the object monitor, I >>>>> don't find any. I see this for each instance of BLOCKED (Or at least, >>>>> the two or three I checked). >>>>> >>>>> Whats your OS? >>>>> >>>>> St.Ack >>> >>> >>> -- >>> Jeff Whiting >>> Qualtrics Senior Software Engineer >>> [EMAIL PROTECTED] >>> >>> >>> > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] > > >
+
Himanshu Vashishtha 2012-09-06, 00:09
-
Re: RS not processing any requests
Ted Yu 2012-09-06, 00:49
Backport has been done in HBASE-6724 Cheers On Wed, Sep 5, 2012 at 5:09 PM, Himanshu Vashishtha <[EMAIL PROTECTED] > wrote: > It usually happens in a long running setup (at least for me). Can you > throttle your load? > > Replication is evolving; I'd say update if you can (or backport the > jiras?). > > Himanshu > > > On Wed, Sep 5, 2012 at 5:53 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > > hmm. So if we are on 0.92 what suggestion would you have to prevent the > > problem? > > > > ~Jeff > > > > > > On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: > >> > >> Number of PRI handlers are governed by > >> "hbase.regionserver.metahandler.count"; default is 10. > >> > >> Increasing their number will not solve it, but will delay its > >> occurring (i don't know about your load etc). > >> > >> Another related jira is HBase-6550. > >> > >> Some more context for your use case: > >> > >> > http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvements> >> > >> > >> On Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED]> > wrote: > >>> > >>> It looks like that is problem we are having. We are on 0.92 so we > don't > >>> get > >>> the patch. But one solution seems to be increasing the privileged > >>> handlers. > >>> How do we increase the number of privilege handlers? > >>> > >>> > >>> ~Jeff > >>> > >>> On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: > >>>> > >>>> Your RS priority handlers are blocked on meta lookup, so it becomes > >>>> unresponsive. Looks like you hitting > >>>> https://issues.apache.org/jira/browse/HBASE-6165> >>>> You running HBase replication? just confirming. > >>>> > >>>> Himanshu > >>>> > >>>> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: > >>>>> > >>>>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook > >>>>> <[EMAIL PROTECTED]> > >>>>> wrote: > >>>>>> > >>>>>> We ran a jstack on the both the RS process and the hbase shell > process > >>>>>> trying to do the scan. > >>>>>> > >>>>>> Jstack log for RS: > >>>>>> http://pastebin.com/9Y9t5ERE> >>>>>> > >>>>> What JVM (I don't know what (20.10-b01 mixed mode) is). > >>>>> > >>>>> I see a bunch of this: > >>>>> > >>>>> "PRI IPC Server handler 5 on 60020" daemon prio=10 > >>>>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry > >>>>> [0x000000004ab0f000] > >>>>> java.lang.Thread.State: BLOCKED (on object monitor) > >>>>> at ..... > >>>>> > >>>>> But when I go to look for other instances of the object monitor, I > >>>>> don't find any. I see this for each instance of BLOCKED (Or at > least, > >>>>> the two or three I checked). > >>>>> > >>>>> Whats your OS? > >>>>> > >>>>> St.Ack > >>> > >>> > >>> -- > >>> Jeff Whiting > >>> Qualtrics Senior Software Engineer > >>> [EMAIL PROTECTED] > >>> > >>> > >>> > > > > -- > > Jeff Whiting > > Qualtrics Senior Software Engineer > > [EMAIL PROTECTED] > > > > > > >
+
Ted Yu 2012-09-06, 00:49
-
Re: RS not processing any requests
Jeff Whiting 2012-09-06, 15:04
Great this is good news! ~Jeff On 9/5/2012 6:49 PM, Ted Yu wrote: > Backport has been done in HBASE-6724 > > Cheers > > On Wed, Sep 5, 2012 at 5:09 PM, Himanshu Vashishtha <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > It usually happens in a long running setup (at least for me). Can you > throttle your load? > > Replication is evolving; I'd say update if you can (or backport the jiras?). > > Himanshu > > > On Wed, Sep 5, 2012 at 5:53 PM, Jeff Whiting <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > hmm. So if we are on 0.92 what suggestion would you have to prevent the > > problem? > > > > ~Jeff > > > > > > On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: > >> > >> Number of PRI handlers are governed by > >> "hbase.regionserver.metahandler.count"; default is 10. > >> > >> Increasing their number will not solve it, but will delay its > >> occurring (i don't know about your load etc). > >> > >> Another related jira is HBase-6550. > >> > >> Some more context for your use case: > >> > >> > http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvements> >> > >> > >> On Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > >>> > >>> It looks like that is problem we are having. We are on 0.92 so we don't > >>> get > >>> the patch. But one solution seems to be increasing the privileged > >>> handlers. > >>> How do we increase the number of privilege handlers? > >>> > >>> > >>> ~Jeff > >>> > >>> On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: > >>>> > >>>> Your RS priority handlers are blocked on meta lookup, so it becomes > >>>> unresponsive. Looks like you hitting > >>>> https://issues.apache.org/jira/browse/HBASE-6165> >>>> You running HBase replication? just confirming. > >>>> > >>>> Himanshu > >>>> > >>>> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > >>>>> > >>>>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook > >>>>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > >>>>> wrote: > >>>>>> > >>>>>> We ran a jstack on the both the RS process and the hbase shell process > >>>>>> trying to do the scan. > >>>>>> > >>>>>> Jstack log for RS: > >>>>>> http://pastebin.com/9Y9t5ERE> >>>>>> > >>>>> What JVM (I don't know what (20.10-b01 mixed mode) is). > >>>>> > >>>>> I see a bunch of this: > >>>>> > >>>>> "PRI IPC Server handler 5 on 60020" daemon prio=10 > >>>>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry > >>>>> [0x000000004ab0f000] > >>>>> java.lang.Thread.State: BLOCKED (on object monitor) > >>>>> at ..... > >>>>> > >>>>> But when I go to look for other instances of the object monitor, I > >>>>> don't find any. I see this for each instance of BLOCKED (Or at least, > >>>>> the two or three I checked). > >>>>> > >>>>> Whats your OS? > >>>>> > >>>>> St.Ack > >>> > >>> > >>> -- > >>> Jeff Whiting > >>> Qualtrics Senior Software Engineer > >>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > >>> > >>> > >>> > > > > -- > > Jeff Whiting > > Qualtrics Senior Software Engineer > > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > > > > > > -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-06, 15:04
-
Re: RS not processing any requests
Jeff Whiting 2012-09-06, 15:04
We can throttle our replication load. How do you do that? ;-) ~Jeff On 9/5/2012 6:09 PM, Himanshu Vashishtha wrote: > It usually happens in a long running setup (at least for me). Can you > throttle your load? > > Replication is evolving; I'd say update if you can (or backport the jiras?). > > Himanshu > > > On Wed, Sep 5, 2012 at 5:53 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >> hmm. So if we are on 0.92 what suggestion would you have to prevent the >> problem? >> >> ~Jeff >> >> >> On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: >>> Number of PRI handlers are governed by >>> "hbase.regionserver.metahandler.count"; default is 10. >>> >>> Increasing their number will not solve it, but will delay its >>> occurring (i don't know about your load etc). >>> >>> Another related jira is HBase-6550. >>> >>> Some more context for your use case: >>> >>> http://search-hadoop.com/m/WHkTxWj0MW/himanshu+vashistha&subj=Re+Long+running+replication+possible+improvements>>> >>> >>> On Wed, Sep 5, 2012 at 5:18 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: >>>> It looks like that is problem we are having. We are on 0.92 so we don't >>>> get >>>> the patch. But one solution seems to be increasing the privileged >>>> handlers. >>>> How do we increase the number of privilege handlers? >>>> >>>> >>>> ~Jeff >>>> >>>> On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: >>>>> Your RS priority handlers are blocked on meta lookup, so it becomes >>>>> unresponsive. Looks like you hitting >>>>> https://issues.apache.org/jira/browse/HBASE-6165>>>>> You running HBase replication? just confirming. >>>>> >>>>> Himanshu >>>>> >>>>> On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >>>>>> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook >>>>>> <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>>> We ran a jstack on the both the RS process and the hbase shell process >>>>>>> trying to do the scan. >>>>>>> >>>>>>> Jstack log for RS: >>>>>>> http://pastebin.com/9Y9t5ERE>>>>>>> >>>>>> What JVM (I don't know what (20.10-b01 mixed mode) is). >>>>>> >>>>>> I see a bunch of this: >>>>>> >>>>>> "PRI IPC Server handler 5 on 60020" daemon prio=10 >>>>>> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >>>>>> [0x000000004ab0f000] >>>>>> java.lang.Thread.State: BLOCKED (on object monitor) >>>>>> at ..... >>>>>> >>>>>> But when I go to look for other instances of the object monitor, I >>>>>> don't find any. I see this for each instance of BLOCKED (Or at least, >>>>>> the two or three I checked). >>>>>> >>>>>> Whats your OS? >>>>>> >>>>>> St.Ack >>>> >>>> -- >>>> Jeff Whiting >>>> Qualtrics Senior Software Engineer >>>> [EMAIL PROTECTED] >>>> >>>> >>>> >> -- >> Jeff Whiting >> Qualtrics Senior Software Engineer >> [EMAIL PROTECTED] >> >> >> -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-06, 15:04
-
Re: RS not processing any requests
Jeff Whiting 2012-09-05, 22:51
Yes we are running hbase replication. ~Jeff On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: > Your RS priority handlers are blocked on meta lookup, so it becomes > unresponsive. Looks like you hitting > https://issues.apache.org/jira/browse/HBASE-6165> You running HBase replication? just confirming. > > Himanshu > > On Wed, Sep 5, 2012 at 4:39 PM, Stack <[EMAIL PROTECTED]> wrote: >> On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook <[EMAIL PROTECTED]> wrote: >>> We ran a jstack on the both the RS process and the hbase shell process >>> trying to do the scan. >>> >>> Jstack log for RS: >>> http://pastebin.com/9Y9t5ERE>>> >> >> What JVM (I don't know what (20.10-b01 mixed mode) is). >> >> I see a bunch of this: >> >> "PRI IPC Server handler 5 on 60020" daemon prio=10 >> tid=0x00002aaac10a1800 nid=0x92f waiting for monitor entry >> [0x000000004ab0f000] >> java.lang.Thread.State: BLOCKED (on object monitor) >> at ..... >> >> But when I go to look for other instances of the object monitor, I >> don't find any. I see this for each instance of BLOCKED (Or at least, >> the two or three I checked). >> >> Whats your OS? >> >> St.Ack -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
+
Jeff Whiting 2012-09-05, 22:51
|
|