|
Shrijeet Paliwal
2012-12-11, 20:05
Ted Yu
2012-12-11, 20:10
Shrijeet Paliwal
2012-12-11, 20:14
Ted Yu
2012-12-11, 20:26
Jean-Daniel Cryans
2012-12-11, 20:32
Shrijeet Paliwal
2012-12-11, 20:49
Shrijeet Paliwal
2012-12-11, 20:50
Shrijeet Paliwal
2012-12-12, 04:05
|
-
HBASE-5898 & 0.92.2Shrijeet Paliwal 2012-12-11, 20:05
Hello All,
If our read of https://issues.apache.org/jira/browse/HBASE-5898 is correct it talks about three issues: #1 Contention on a lock causing bad performance #2 HDFS slowness causing IPC handlers blocked for large times. #3 A mysterious bug which causes what looks like a deadlock In out environment[1] we are seeing #3 (ish) signals. Attached is the stack trace. One of our region server got blocked (almost all of IPC handlers) for 11 hours, so it can not be #1 OR #2. We have not back ported the patch yet so cant say if it fixes the issue or not. Also it is difficult to reproduce. There is a rumor that a program to reproduce this issue exist, does it? Eager to hear thoughts. [1] HBase version 0.92.2 , Hadoop version CDH3u0 -Shrijeet
-
Re: HBASE-5898 & 0.92.2Ted Yu 2012-12-11, 20:10
Shrijeet:
The attachment didn't go through. Can you use pastebin ? Thanks On Tue, Dec 11, 2012 at 12:05 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>wrote: > Hello All, > > If our read of https://issues.apache.org/jira/browse/HBASE-5898 is > correct it talks about three issues: > > #1 Contention on a lock causing bad performance > #2 HDFS slowness causing IPC handlers blocked for large times. > #3 A mysterious bug which causes what looks like a deadlock > > In out environment[1] we are seeing #3 (ish) signals. Attached is the > stack trace. One of our region server got blocked (almost all of IPC > handlers) for 11 hours, so it can not be #1 OR #2. > > We have not back ported the patch yet so cant say if it fixes the issue or > not. Also it is difficult to reproduce. There is a rumor that a program to > reproduce this issue exist, does it? > > Eager to hear thoughts. > > [1] HBase version 0.92.2 , Hadoop version CDH3u0 > -Shrijeet >
-
Re: HBASE-5898 & 0.92.2Shrijeet Paliwal 2012-12-11, 20:14
Done https://gist.github.com/4261746
On Tue, Dec 11, 2012 at 12:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Shrijeet: > The attachment didn't go through. > > Can you use pastebin ? > > Thanks > > On Tue, Dec 11, 2012 at 12:05 PM, Shrijeet Paliwal > <[EMAIL PROTECTED]>wrote: > > > Hello All, > > > > If our read of https://issues.apache.org/jira/browse/HBASE-5898 is > > correct it talks about three issues: > > > > #1 Contention on a lock causing bad performance > > #2 HDFS slowness causing IPC handlers blocked for large times. > > #3 A mysterious bug which causes what looks like a deadlock > > > > In out environment[1] we are seeing #3 (ish) signals. Attached is the > > stack trace. One of our region server got blocked (almost all of IPC > > handlers) for 11 hours, so it can not be #1 OR #2. > > > > We have not back ported the patch yet so cant say if it fixes the issue > or > > not. Also it is difficult to reproduce. There is a rumor that a program > to > > reproduce this issue exist, does it? > > > > Eager to hear thoughts. > > > > [1] HBase version 0.92.2 , Hadoop version CDH3u0 > > -Shrijeet > > >
-
Re: HBASE-5898 & 0.92.2Ted Yu 2012-12-11, 20:26
Have you seen Lars' comment today on the JIRA ?
I don't know whether CDH3u0 is close to hdfs 1.0.3 or 1.0.4 Cheers On Tue, Dec 11, 2012 at 12:14 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>wrote: > Done https://gist.github.com/4261746 > > > On Tue, Dec 11, 2012 at 12:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Shrijeet: > > The attachment didn't go through. > > > > Can you use pastebin ? > > > > Thanks > > > > On Tue, Dec 11, 2012 at 12:05 PM, Shrijeet Paliwal > > <[EMAIL PROTECTED]>wrote: > > > > > Hello All, > > > > > > If our read of https://issues.apache.org/jira/browse/HBASE-5898 is > > > correct it talks about three issues: > > > > > > #1 Contention on a lock causing bad performance > > > #2 HDFS slowness causing IPC handlers blocked for large times. > > > #3 A mysterious bug which causes what looks like a deadlock > > > > > > In out environment[1] we are seeing #3 (ish) signals. Attached is the > > > stack trace. One of our region server got blocked (almost all of IPC > > > handlers) for 11 hours, so it can not be #1 OR #2. > > > > > > We have not back ported the patch yet so cant say if it fixes the issue > > or > > > not. Also it is difficult to reproduce. There is a rumor that a > program > > to > > > reproduce this issue exist, does it? > > > > > > Eager to hear thoughts. > > > > > > [1] HBase version 0.92.2 , Hadoop version CDH3u0 > > > -Shrijeet > > > > > >
-
Re: HBASE-5898 & 0.92.2Jean-Daniel Cryans 2012-12-11, 20:32
This really looks like the issue we got that prompted applying HBASE-5898.
Never heard about the rumor. If you can isolate the region that exhibits the issue, put it on a region server by itself and patch that region server to see if it fixes the issue. Also, can you confirm that you are running java 1.6 u16? Thx, J-D On Tue, Dec 11, 2012 at 12:14 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> wrote: > Done https://gist.github.com/4261746 > > > On Tue, Dec 11, 2012 at 12:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> Shrijeet: >> The attachment didn't go through. >> >> Can you use pastebin ? >> >> Thanks >> >> On Tue, Dec 11, 2012 at 12:05 PM, Shrijeet Paliwal >> <[EMAIL PROTECTED]>wrote: >> >> > Hello All, >> > >> > If our read of https://issues.apache.org/jira/browse/HBASE-5898 is >> > correct it talks about three issues: >> > >> > #1 Contention on a lock causing bad performance >> > #2 HDFS slowness causing IPC handlers blocked for large times. >> > #3 A mysterious bug which causes what looks like a deadlock >> > >> > In out environment[1] we are seeing #3 (ish) signals. Attached is the >> > stack trace. One of our region server got blocked (almost all of IPC >> > handlers) for 11 hours, so it can not be #1 OR #2. >> > >> > We have not back ported the patch yet so cant say if it fixes the issue >> or >> > not. Also it is difficult to reproduce. There is a rumor that a program >> to >> > reproduce this issue exist, does it? >> > >> > Eager to hear thoughts. >> > >> > [1] HBase version 0.92.2 , Hadoop version CDH3u0 >> > -Shrijeet >> > >>
-
Re: HBASE-5898 & 0.92.2Shrijeet Paliwal 2012-12-11, 20:49
On Tue, Dec 11, 2012 at 12:32 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> This really looks like the issue we got that prompted applying HBASE-5898. > > Never heard about the rumor. > Was referring to Stack's comment http://goo.gl/1dMVb > > If you can isolate the region that exhibits the issue, put it on a > region server by itself and patch that region server to see if it > fixes the issue. > I can isolate the region fine (when we attempted to restart the regionserver, stuck threads got interrupted & logged the region name they were trying to load data for). But that region has gone through a minor compaction and the situation (files et al) in file system has changed. I am asssuming state of filesystem is what you were hoping will help us reproduce. Or you are coming from some where else? > Also, can you confirm that you are running java 1.6 u16? > Yes we are running Java 1.6.0_16.
-
Re: HBASE-5898 & 0.92.2Shrijeet Paliwal 2012-12-11, 20:50
On Tue, Dec 11, 2012 at 12:26 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> I don't know whether CDH3u0 is close to hdfs 1.0.3 or 1.0.4 > Ted, CDH3u0 does not have HDFS-2246 (Shortcut a local client reads to a Datanodes files directly) I think it is included in CDH3u3.
-
Re: HBASE-5898 & 0.92.2Shrijeet Paliwal 2012-12-12, 04:05
We might be facing HBASE-3622 . Recently removed the UseMemBar from our GC
opts. I will bring it back & see if the issue gets resolved. On Tue, Dec 11, 2012 at 12:50 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>wrote: > > > On Tue, Dec 11, 2012 at 12:26 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> I don't know whether CDH3u0 is close to hdfs 1.0.3 or 1.0.4 >> > > Ted, CDH3u0 does not have HDFS-2246 (Shortcut a local client reads to a > Datanodes files directly) I think it is included in CDH3u3. > |