Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region server crashes when using replication


Copy link to this message
-
Re: Region server crashes when using replication
Thanks, J-D.

As for the first issue, why does this behavior make sense? What
happens when the connection between the two cluster fails? Will the
region servers of the primary fail as well? or at least won't be able
to start? Seems very radical.

Regarding the second issue, I didn't see anything else in the logs, it
just seemed like it decided to shutdown, but maybe I missed it. I will
try to reproduce that and let you know if I succeed.

Regarding the timeout to detect a failed server, 3 minutes sounds like
a very long time for a region server to be down. Obviously, during
that time the data owned by that server is inaccessible. Is there a
reason for this long timeout? Can it be configured?

-eran

On Tue, Mar 22, 2011 at 20:22, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
>
> First issue: UnknownHostException is unforgiving, your machines need
> to be able to talk to haddop2-zk3 (is that a typo?)  and it seems that
> at least that one can't. The reason the machine dies is that we
> usually try to "fail fast" in HBase.
>
> Second issue: There's not enough information, all I see is a region
> server shutting down and the reason why is probably before that.
>
> Third issue: https://issues.apache.org/jira/browse/HBASE-3664
>
> Fourth issue: it's now 3 minutes in 0.90 for the timeout to happen.
>
> J-D
>