-Fully qualified path names in distributed log splitting.
lars hofhansl 2013-02-05, 07:32
We just found ourselves in an interesting pickle.
We were upgrading one of our clusters from HBase 0.94.0 on Hadoop 1.0.4 to HBase 0.94.4 on top of Hadoop 2.
The cluster has been setup a while ago and the old shutdown script had a bug and shutdown HBase and HDFS uncleanly.
Assuming that the log will be replayed we upgraded Hadoop to 2.0.x, and verified that from a file system view everything is OK.
The new HDFS runs with an HA NameNode, so the FS changed from hdfs://<old host name> to hdfs://<ha cluster name>
Then we brought up HBase and found it stuck in splitting logs forever.
In the log we see messages like these:
2013-02-05 06:22:31,045 ERROR org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
hdfs://<old NN host>/.logs/<rs host>,60020,1358540589323-splitting/<rs host>%2C60020%2C1358540589323.1359962644861,
expected: hdfs://<ha cluster name>
So it looks like distributed log splitting stores the full HDFS path name including the host, which seems unnecessary.
This path is stored in ZK.
So all in all it seems that only can happen if all the following is true: unclean shutdown, keeping the same ZK ensemble, changed FS.
The data is not important, we can just blow it away, but we want to prove that we could recover the data if we had to.
It seems we have three options:
1. Blow away the data in ZK under "splitlog", and restart HBase. It should restart the split process with the correct pathnames.
2. Temporarily change the config for the region server to set the root dir to hdfs://<old NN host>, bounce HBase. The log splitting should now be able to succeed.
3. Downgrade back to the old Hadoop (we kept a copy of the image).
We're trying option #2, to see whether that would fix it. #1 should work too.
Has anybody else experienced this?
It seems that would also limit our ability to take a snapshot of a filesystem and move it to somewhere else, as the hostnames are hardcoded, at least in ZK for log splitting.