Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Region server crashes when using replication

Copy link to this message
Re: Region server crashes when using replication
Eran Kutner 2011-03-24, 08:13
Here's what I found. I started with 2 RSs running in the cluster (#1 and #4).
This is how ZK looked at that point:
[zk: hadoop1-zk3:2181(CONNECTED) 25] ls /hbase/rs
[hadoop1-s01,60020,1300952215842, hadoop1-s04,60020,1300881354710]
[zk: hadoop1-zk3:2181(CONNECTED) 26] ls /hbase/replication/rs

I then started RS #2 it seems that it is looking for the replication
log file which it can't find (see attached log).
Immediatly after that ZK looks like this:
[zk: hadoop1-zk3:2181(CONNECTED) 27] ls /hbase/replication/rs

This is how .log on HDFS looks at this time (.oldlogs directory is empty):
Found 5 items
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:36
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:50
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:36
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:56
drwxr-xr-x   - hbase supergroup          0 2011-03-23 16:57

After some time, while I'm writing this, I now see that RS #1 has now
crashed and ZK looks like this (see attached log):
[zk: hadoop1-zk3:2181(CONNECTED) 29] ls /hbase/replication/rs

One thing strange I'm noticing is that in the /hbase/rs node the
servers are listed with their host name only while in
/hbase/replication/rs they are listed with their fully qualified DNS
names. Is this intentional?

Note: I changed the domain name in this email because I think the
mailing list's spam filter doesn't like it. The attached logs still
show the full name.

On Thu, Mar 24, 2011 at 00:28, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> On Wed, Mar 23, 2011 at 3:22 PM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>> They are using two separate ensembles, 3 servers in each. I'm trying to
>> create total independence for each cluster.
> Can you find out who deleted the znode that the region server failed
> on? If it reported the status a few times before that, it means it
> existed whereas the other bug was that the znode never existed.
> J-D