lars hofhansl 2013-02-05, 07:32
Elliott Clark 2013-02-05, 07:39
lars hofhansl 2013-02-05, 07:48
-Re: Fully qualified path names in distributed log splitting.
Matteo Bertozzi 2013-02-05, 07:39
HBASE-7723 - Remove NN URI from ZK splitlogs
On Tue, Feb 5, 2013 at 7:32 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> We just found ourselves in an interesting pickle.
> We were upgrading one of our clusters from HBase 0.94.0 on Hadoop 1.0.4 to
> HBase 0.94.4 on top of Hadoop 2.
> The cluster has been setup a while ago and the old shutdown script had a
> bug and shutdown HBase and HDFS uncleanly.
> Assuming that the log will be replayed we upgraded Hadoop to 2.0.x, and
> verified that from a file system view everything is OK.
> The new HDFS runs with an HA NameNode, so the FS changed from hdfs://<old
> host name> to hdfs://<ha cluster name>
> Then we brought up HBase and found it stuck in splitting logs forever.
> In the log we see messages like these:
> 2013-02-05 06:22:31,045 ERROR
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> Wrong FS:
> hdfs://<old NN host>/.logs/<rs host>,60020,1358540589323-splitting/<rs
> expected: hdfs://<ha cluster name>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547)
> at java.lang.Thread.run(Thread.java:662)
> So it looks like distributed log splitting stores the full HDFS path name
> including the host, which seems unnecessary.
> This path is stored in ZK.
> So all in all it seems that only can happen if all the following is true:
> unclean shutdown, keeping the same ZK ensemble, changed FS.
> The data is not important, we can just blow it away, but we want to prove
> that we could recover the data if we had to.
> It seems we have three options:
> 1. Blow away the data in ZK under "splitlog", and restart HBase. It should
> restart the split process with the correct pathnames.
> 2. Temporarily change the config for the region server to set the root dir
> to hdfs://<old NN host>, bounce HBase. The log splitting should now be able
> to succeed.
> 3. Downgrade back to the old Hadoop (we kept a copy of the image).
> We're trying option #2, to see whether that would fix it. #1 should work
> Has anybody else experienced this?
> It seems that would also limit our ability to take a snapshot of a
> filesystem and move it to somewhere else, as the hostnames are hardcoded,
> at least in ZK for log splitting.
> -- Lars