Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - All region server died due to "Parent directory doesn't exist"


+
lars hofhansl 2013-05-09, 06:39
+
lars hofhansl 2013-05-09, 07:23
+
lars hofhansl 2013-05-09, 07:41
+
Ted Yu 2013-05-09, 08:33
+
Andrew Purtell 2013-05-09, 08:59
+
Ted Yu 2013-05-09, 09:04
+
Andrew Purtell 2013-05-09, 09:06
+
lars hofhansl 2013-05-09, 15:48
+
Ted Yu 2013-05-09, 16:07
+
lars hofhansl 2013-05-09, 16:16
+
Varun Sharma 2013-05-09, 16:39
+
Varun Sharma 2013-05-09, 16:41
+
Ted Yu 2013-05-09, 16:51
+
lars hofhansl 2013-05-09, 17:03
+
Stack 2013-05-09, 17:34
+
lars hofhansl 2013-05-09, 18:13
+
lars hofhansl 2013-05-09, 18:28
+
Enis Söztutar 2013-05-10, 01:10
+
lars hofhansl 2013-05-10, 04:25
+
Enis Söztutar 2013-05-10, 05:01
Copy link to this message
-
Re: All region server died due to "Parent directory doesn't exist"
lars hofhansl 2013-05-10, 05:47
Nope. That does not appear to be the problem.
________________________________
 From: Enis Söztutar <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Thursday, May 9, 2013 10:01 PM
Subject: Re: All region server died due to "Parent directory doesn't exist"
 

But you see the zookeeper session timeout events in RS logs, and the master
says that zk session for the RS's has expired, right?
On Thu, May 9, 2013 at 9:25 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Still looking. Stack and Himanshu are looking too (tanks again!).
>
> What I do know is that it has to do the fencing mechanism during log
> splitting.
> Until I bounced HDFS and ZK (ZK probably being the culprit) each started
> RegionServer would immediately be fenced off (it's log directory renamed).
> Probably by the SSH.
>
> It is not clear what caused the first RS to die. While there is no direct
> evidence, from the logs it looks like the log directory was just suddenly
> renamed.
>
> I'll spend more time in the logs and also watch for this happening again.
>
> We did find another misconfigured cluster that had some services pointed
> at this cluster. It does not look like that was actually a problem - there
> is no evidence in the logs that this actually caused a problem, but it made
> this deploy somewhat "special".
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Enis Söztutar <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Sent: Thursday, May 9, 2013 6:10 PM
> Subject: Re: All region server died due to "Parent directory doesn't exist"
>
>
>
> Could we able to find the root cause?
>
>
>
> On Thu, May 9, 2013 at 11:28 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> Good news is that as far as I can tell no data was lost.
> >Eventually all logs were split and replayed.
> >
> >
> >
> >-- Lars
> >
> >
> >
> >----- Original Message -----
> >
> >From: lars hofhansl <[EMAIL PROTECTED]>
> >To: HBase Dev List <[EMAIL PROTECTED]>
> >
> >Cc:
> >Sent: Thursday, May 9, 2013 11:13 AM
> >Subject: Re: All region server died due to "Parent directory doesn't
> exist"
> >
> >Thanks Stack.
> >
> >I sent the logs.
> >Also, I have since bounced HDFS and ZK and the problem is gone now (I can
> start RSs again and they stay up). Something got into a weird state.
> >
> >
> >-- Lars
> >
> >
> >
> >________________________________
> >From: Stack <[EMAIL PROTECTED]>
> >To: HBase Dev List <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> >Sent: Thursday, May 9, 2013 10:34 AM
> >Subject: Re: All region server died due to "Parent directory doesn't
> exist"
> >
> >
> >
> >Want to send me a regionserver log Lars? (off-list)
> >St.Ack
> >
> >
> >
> >On Thu, May 9, 2013 at 10:03 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> >Thanks Ted and Varun.
> >>
> >>
> >>Let me check on the .META. server.
> >>
> >>
> >>The majority (13) of the RSs died within 2 minutes. The remaining 3 died
> over the following 10 minutes.
> >>So that would point to general issue. I did not see any ZK issues but
> I'll double check.
> >>
> >>
> >>It is just interesting that even now, if I start and RS it aborts within
> a minute or two, because of this issue.
> >>
> >>
> >>-- Lars
> >>
> >>
> >>----- Original Message -----
> >>From: Ted Yu <[EMAIL PROTECTED]>
> >>To: [EMAIL PROTECTED]
> >>
> >>Cc:
> >>Sent: Thursday, May 9, 2013 9:51 AM
> >>Subject: Re: All region server died due to "Parent directory doesn't
> exist"
> >>
> >>Thanks Varun for sharing your experience.
> >>
> >>Lars:
> >>Was the server carrying .META. functioning properly around the time when
> >>you observed the problem ?
> >>
> >>Cheers
> >>
> >>On Thu, May 9, 2013 at 9:41 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> I meant no NTP/clock synchronization b/w zookeeper quorum and the HBase
> >>> cluster. I am not sure if you are seeing the exact same issue though.
> We
> >>> did not have mass failures at the same time due to this..
+
lars hofhansl 2013-05-09, 16:38
+
takeshi 2014-02-19, 03:18