My cluster got some troubles last night and at the end, all the
servers went down. Hadoop is still running, but HBase is not.
I have no clue what the root cause is. I looked at the logs on the
master side, and the fist line when it started to go down was:
2012-07-24 01:20:13,227 INFO
ephemeral node deleted, processing expiration
And then everything has started to die.
At the end, on the master side, I have this in the out file:
hbase@node3:~/hbase-0.94.0$ cat logs/hbase-hbase-master-node3.out
Exception in thread "master-node3,60000,1342789522486"
I think this one need to be addressed.
I looked at my zookeeper logs and I have one entry every 2 seconds. So
I think something is missconfigured and I will look at it. So the goal
of this post is just to report the error above and see if this should
be fixed by adding a null check on the related code.