Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> All region server died due to "Parent directory doesn't exist"


+
lars hofhansl 2013-05-09, 06:39
+
lars hofhansl 2013-05-09, 07:23
+
lars hofhansl 2013-05-09, 07:41
+
Ted Yu 2013-05-09, 08:33
+
Andrew Purtell 2013-05-09, 08:59
+
Ted Yu 2013-05-09, 09:04
+
Andrew Purtell 2013-05-09, 09:06
+
lars hofhansl 2013-05-09, 15:48
+
Ted Yu 2013-05-09, 16:07
+
lars hofhansl 2013-05-09, 16:16
+
Varun Sharma 2013-05-09, 16:39
+
Varun Sharma 2013-05-09, 16:41
+
Ted Yu 2013-05-09, 16:51
+
lars hofhansl 2013-05-09, 17:03
+
Stack 2013-05-09, 17:34
+
lars hofhansl 2013-05-09, 18:13
Copy link to this message
-
Re: All region server died due to "Parent directory doesn't exist"
Good news is that as far as I can tell no data was lost.
Eventually all logs were split and replayed.
-- Lars

----- Original Message -----
From: lars hofhansl <[EMAIL PROTECTED]>
To: HBase Dev List <[EMAIL PROTECTED]>
Cc:
Sent: Thursday, May 9, 2013 11:13 AM
Subject: Re: All region server died due to "Parent directory doesn't exist"

Thanks Stack.

I sent the logs.
Also, I have since bounced HDFS and ZK and the problem is gone now (I can start RSs again and they stay up). Something got into a weird state.
-- Lars

________________________________
From: Stack <[EMAIL PROTECTED]>
To: HBase Dev List <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Thursday, May 9, 2013 10:34 AM
Subject: Re: All region server died due to "Parent directory doesn't exist"

Want to send me a regionserver log Lars? (off-list)
St.Ack

On Thu, May 9, 2013 at 10:03 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

Thanks Ted and Varun.
>
>
>Let me check on the .META. server.
>
>
>The majority (13) of the RSs died within 2 minutes. The remaining 3 died over the following 10 minutes.
>So that would point to general issue. I did not see any ZK issues but I'll double check.
>
>
>It is just interesting that even now, if I start and RS it aborts within a minute or two, because of this issue.
>
>
>-- Lars
>
>
>----- Original Message -----
>From: Ted Yu <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>
>Cc:
>Sent: Thursday, May 9, 2013 9:51 AM
>Subject: Re: All region server died due to "Parent directory doesn't exist"
>
>Thanks Varun for sharing your experience.
>
>Lars:
>Was the server carrying .META. functioning properly around the time when
>you observed the problem ?
>
>Cheers
>
>On Thu, May 9, 2013 at 9:41 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
>> I meant no NTP/clock synchronization b/w zookeeper quorum and the HBase
>> cluster. I am not sure if you are seeing the exact same issue though. We
>> did not have mass failures at the same time due to this..
>>
>> Thanks
>> Varun
>>
>>
>> On Thu, May 9, 2013 at 9:39 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>>
>> > Btw, I am not 100 % sure but I have some seen something like this before:
>> >
>> > 1) ZK connection flakiness causes ephemeral nodes to expire
>> > 2) Master detects failure and renames the logs into a splitting directory
>> > - this is intentional so that in case that region server comes back up,
>> it
>> > cannot write to the logs being split
>> > 3) Region server dies because the log is renamed
>> >
>> > So, the yanking away of files is done by the HBase master and is expected
>> > if the master feels the server is dead. We found that the Region server
>> > logs DFS exceptions like crazy (1000s of them) in that case and we always
>> > suspected that this is some kind of DFS error but when we really go upto
>> > the point where it started, we found some zookeeper session issues.
>> >
>> > We had two cases of this - either super high load or NTP/no clock
>> > synchronization b/w the clusters causing this issue for us.
>> >
>> > Thanks
>> > Varun
>> >
>> >
>> > On Thu, May 9, 2013 at 9:16 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >
>> >> Thanks Ted. I'll do the same.
>> >>
>> >>
>> >> ----- Original Message -----
>> >> From: Ted Yu <[EMAIL PROTECTED]>
>> >> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> >> Cc:
>> >> Sent: Thursday, May 9, 2013 9:07 AM
>> >> Subject: Re: All region server died due to "Parent directory doesn't
>> >> exist"
>> >>
>> >> I went through the patch for HBASE-7824 one more time and didn't find
>> >> direct correlation to the issue Lars reported.
>> >>
>> >> I am going over the other JIRAs in Lars' list.
>> >>
>> >> Cheers
>> >>
>> >> On Thu, May 9, 2013 at 8:48 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > I will try. I do not think this is the issue, though.
>> >> >
>> >> > The master is up in my case.
>> >> > Right now the cluster is in a state where each region server aborts
>> >> itself
>> >> > shortly after being started (which coincides with having it's log
+
Enis Söztutar 2013-05-10, 01:10
+
lars hofhansl 2013-05-10, 04:25
+
Enis Söztutar 2013-05-10, 05:01
+
lars hofhansl 2013-05-10, 05:47
+
lars hofhansl 2013-05-09, 16:38
+
takeshi 2014-02-19, 03:18
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB