Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region server crashes when using replication


Copy link to this message
-
Re: Region server crashes when using replication
Here's what I found. I started with 2 RSs running in the cluster (#1 and #4).
This is how ZK looked at that point:
[zk: hadoop1-zk3:2181(CONNECTED) 25] ls /hbase/rs
[hadoop1-s01,60020,1300952215842, hadoop1-s04,60020,1300881354710]
[zk: hadoop1-zk3:2181(CONNECTED) 26] ls /hbase/replication/rs
[hadoop1-s01.farm-ny.gig.com,60020,1300952215842]

I then started RS #2 it seems that it is looking for the replication
log file which it can't find (see attached log).
Immediatly after that ZK looks like this:
[zk: hadoop1-zk3:2181(CONNECTED) 27] ls /hbase/replication/rs
[hadoop1-s02.farm-ny.gig.com,60020,1300953027434]

This is how .log on HDFS looks at this time (.oldlogs directory is empty):
Found 5 items
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:36
/hbase/.logs/hadoop1-s01.farm-ny.gig.com,60020,1300952215842
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:50
/hbase/.logs/hadoop1-s02.farm-ny.gig.com,60020,1300953027434
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:36
/hbase/.logs/hadoop1-s03.farm-ny.gig.com,60020,1300952197026
drwxr-xr-x   - hbase supergroup          0 2011-03-24 03:56
/hbase/.logs/hadoop1-s04.farm-ny.gig.com,60020,1300881354710
drwxr-xr-x   - hbase supergroup          0 2011-03-23 16:57
/hbase/.logs/hadoop1-s05.farm-ny.gig.com,60020,1300913823878

After some time, while I'm writing this, I now see that RS #1 has now
crashed and ZK looks like this (see attached log):
[zk: hadoop1-zk3:2181(CONNECTED) 29] ls /hbase/replication/rs
[hadoop1-s02.farm-ny.gig.com,60020,1300953027434,
hadoop1-s04.farm-ny.gig.com,60020,1300881354710]

One thing strange I'm noticing is that in the /hbase/rs node the
servers are listed with their host name only while in
/hbase/replication/rs they are listed with their fully qualified DNS
names. Is this intentional?

Note: I changed the domain name in this email because I think the
mailing list's spam filter doesn't like it. The attached logs still
show the full name.

-eran
On Thu, Mar 24, 2011 at 00:28, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> On Wed, Mar 23, 2011 at 3:22 PM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>> They are using two separate ensembles, 3 servers in each. I'm trying to
>> create total independence for each cluster.
>
> Can you find out who deleted the znode that the region server failed
> on? If it reported the status a few times before that, it means it
> existed whereas the other bug was that the znode never existed.
>
> J-D
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB