Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HMaster not failing over dead RegionServers


+
Bryan Beaudreault 2012-06-30, 05:04
+
Stack 2012-06-30, 15:09
Copy link to this message
-
Re: HMaster not failing over dead RegionServers
Bryan,

The master could not detect if the region server is dead.
How do you set the zookeeper session timeout?

Thanks,
Jimmy

On Sat, Jun 30, 2012 at 8:09 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Sat, Jun 30, 2012 at 7:04 AM, Bryan Beaudreault
> <[EMAIL PROTECTED]> wrote:
>> 12/06/30 00:07:22 INFO ipc.Client: Retrying connect to server: /
>> 10.125.18.129:50020. Already tried 14 time(s).
>>
>
> This was one of the servers that went down?
>
>> It was not following through the splitting of HLog files and didn't appear
>> to be moving regions off failed hosts.  After giving it about 20 minutes to
>> try to right itself, I tried restarting the service.  The restart script
>> just hung for a while printing dots and nothing apparent was happening on
>> the logs at the time.
>
> Can we see the log  Bryan?
>
> You might thread dump when its hung-up the next time Bryan (Would be
> something for us to do a looksee on).
>
>> Finally I kill -9 the process, so that another
>> master could take over.  The new master seemed to start splitting logs, but
>> eventually got into the same state of printing the above message.
>>
>
> You think it a particular log?
>
>
>> Eventually it all worked out, but it took WAY too long (almost an hour, all
>> said).  Is this something that is tunable?
>
> Have RS carry less WALs?  Its a configuration.
>
>> They should have instantly been
>> removed from the list instead of retrying so many times.  Each server was
>> retried upwards of 30-40 times.
>>
>
> Yeah, thats a bit silly.
>
> We're working on the MTTR in general.  You logs would be of interest
> to a few of us if its ok that someone else can take a look.
>
> St.Ack
>
>> I am running cdh3u2 (0.90.4).
>>
>> Thanks,
>>
>> Bryan
+
Suraj Varma 2012-07-02, 23:56
+
Bryan Beaudreault 2012-07-03, 00:17