|
|
-
could not start HMaster
Yuling_C@... 2012-10-15, 19:26
Hi,
I set up a single node HBase server on top of Hadoop and it has been working fine with most of my testing scenarios such as creating tables and inserting data. Just during the weekend, I accidentally left a testing script running that inserts about 67 rows every min for three days. Today when I looked at the environment, I found out that HBase master could not be started anymore. Digging into the logs, I could see that starting from the second day, HBase first got an exception as follows:
2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992, entries=7981, filesize=3754556. for /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 whose highest sequenceid is 4 to /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 2012-10-13 13:05:07,379 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller java.io.FileNotFoundException: File file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163) at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287) at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428) at org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662)
Then SplitLogManager kept splitting the logs for about two days: 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x139ff3656b30003, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) at java.lang.Thread.run(Thread.java:662) 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:52573 which had sessionid 0x139ff3656b30003 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-10-13 13:05:09,085 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for sflow-linux02.santanet.dell.com,47137,1348606516541 2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker sflow-linux02.santanet.dell.com,47137,1348606516541 2012-10-13 13:05:09,101 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541-splitting] 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker closing leases 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker closed leases 2012-10-13 13:08:09,275 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000028 entered state done sflow-linux02.santanet.dell.com,37015,1348606516151 2012-10-13 13:11:09,730 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000029 entered state done sflow-linux02.santanet.dell.com,37015,1348606516151 2012-10-13 13:14:10,171 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/RESCAN0000000030 entered state done sflow-linux02.santanet.dell.com,37015,1348606516151
When I tried to re-start HBase server today, the following exception occurs: 2012-10-15 11:54:10,122 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x13a65c6a8090002, negotiated timeout = 40000 2012-10-15 11:54:10,124 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and 0 rescan nodes 2012-10-15 11:54:10,238 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.util.FileSystemVersionException: File system needs to be upgraded. You have version null and I want version 7. Run the '${HBASE_HOME}/bin/hbase migrate' script. at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:245) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:347) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127) Just wondering what happened and is there any way to recover from this situation? Is re-installation of HBase my only choice at this moment?
Thanks very much,
YuLing
-
Re: could not start HMaster
Jimmy Xiang 2012-10-15, 20:32
Is your /tmp folder cleaned up automatically and some files are gone?
Thanks, Jimmy
On Mon, Oct 15, 2012 at 12:26 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > I set up a single node HBase server on top of Hadoop and it has been working fine with most of my testing scenarios such as creating tables and inserting data. Just during the weekend, I accidentally left a testing script running that inserts about 67 rows every min for three days. Today when I looked at the environment, I found out that HBase master could not be started anymore. Digging into the logs, I could see that starting from the second day, HBase first got an exception as follows: > > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992, entries=7981, filesize=3754556. for /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364 > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 whose highest sequenceid is 4 to /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > 2012-10-13 13:05:07,379 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller > java.io.FileNotFoundException: File file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 does not exist. > at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163) > at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287) > at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428) > at org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825) > at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708) > at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603) > at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > at java.lang.Thread.run(Thread.java:662) > > Then SplitLogManager kept splitting the logs for about two days: > 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid 0x139ff3656b30003, likely client has closed socket > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) > at java.lang.Thread.run(Thread.java:662) > 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:52573 which had sessionid 0x139ff3656b30003 > 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down > 2012-10-13 13:05:09,085 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for sflow-linux02.santanet.dell.com,47137,1348606516541 > 2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker sflow-linux02.santanet.dell.com,47137,1348606516541 > 2012-10-13 13:05:09,101 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541-splitting] > 2012-10-13 13:05:14,545 INFO org.apache.hadoop.hbase.regionserver.Leases: RegionServer:0;sflow-linux02.santanet.dell.com,47137,1348606516541.leaseChecker closing leases
-
RE: could not start HMaster
Yuling_C@... 2012-10-15, 20:35
No, I don't think so. This is a dedicated testing machine and no automatic cleaning up on the /tmp folder...
Thanks,
YuLing
-----Original Message----- From: Jimmy Xiang [mailto:[EMAIL PROTECTED]] Sent: Monday, October 15, 2012 1:32 PM To: [EMAIL PROTECTED] Subject: Re: could not start HMaster
Is your /tmp folder cleaned up automatically and some files are gone?
Thanks, Jimmy
On Mon, Oct 15, 2012 at 12:26 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > I set up a single node HBase server on top of Hadoop and it has been working fine with most of my testing scenarios such as creating tables and inserting data. Just during the weekend, I accidentally left a testing script running that inserts about 67 rows every min for three days. Today when I looked at the environment, I found out that HBase master could not be started anymore. Digging into the logs, I could see that starting from the second day, HBase first got an exception as follows: > > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992, entries=7981, filesize=3754556. for /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364 > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 whose highest sequenceid is 4 to /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > 2012-10-13 13:05:07,379 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller > java.io.FileNotFoundException: File file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 does not exist. > at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163) > at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287) > at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428) > at org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825) > at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708) > at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603) > at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > at java.lang.Thread.run(Thread.java:662) > > Then SplitLogManager kept splitting the logs for about two days: > 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid 0x139ff3656b30003, likely client has closed socket > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) > at java.lang.Thread.run(Thread.java:662) > 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:52573 which had sessionid 0x139ff3656b30003 > 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down > 2012-10-13 13:05:09,085 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for sflow-linux02.santanet.dell.com,47137,1348606516541 > 2012-10-13 13:05:09,086 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker sflow-linux02.santanet.dell.com,47137,1348606516541
-
答复: could not start HMaster
谢良 2012-10-18, 03:48
Is there any complain in HDFS log ? ________________________________________ 发件人: [EMAIL PROTECTED] [[EMAIL PROTECTED]] 发送时间: 2012年10月16日 4:35 收件人: [EMAIL PROTECTED] 主题: RE: could not start HMaster
No, I don't think so. This is a dedicated testing machine and no automatic cleaning up on the /tmp folder...
Thanks,
YuLing
-----Original Message----- From: Jimmy Xiang [mailto:[EMAIL PROTECTED]] Sent: Monday, October 15, 2012 1:32 PM To: [EMAIL PROTECTED] Subject: Re: could not start HMaster
Is your /tmp folder cleaned up automatically and some files are gone?
Thanks, Jimmy
On Mon, Oct 15, 2012 at 12:26 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > I set up a single node HBase server on top of Hadoop and it has been working fine with most of my testing scenarios such as creating tables and inserting data. Just during the weekend, I accidentally left a testing script running that inserts about 67 rows every min for three days. Today when I looked at the environment, I found out that HBase master could not be started anymore. Digging into the logs, I could see that starting from the second day, HBase first got an exception as follows: > > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350155105992, entries=7981, filesize=3754556. for /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1350158707364 > 2012-10-13 13:05:07,367 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file /tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 whose highest sequenceid is 4 to /tmp/hbase-root/hbase/.oldlogs/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 > 2012-10-13 13:05:07,379 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server sflow-linux02.santanet.dell.com,47137,1348606516541: IOE in log roller > java.io.FileNotFoundException: File file:/tmp/hbase-root/hbase/.logs/sflow-linux02.santanet.dell.com,47137,1348606516541/sflow-linux02.santanet.dell.com%2C47137%2C1348606516541.1348606520442 does not exist. > at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:213) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163) > at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:287) > at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:428) > at org.apache.hadoop.hbase.regionserver.wal.HLog.archiveLogFile(HLog.java:825) > at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:708) > at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:603) > at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > at java.lang.Thread.run(Thread.java:662) > > Then SplitLogManager kept splitting the logs for about two days: > 2012-10-13 13:05:09,061 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid 0x139ff3656b30003, likely client has closed socket > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) > at java.lang.Thread.run(Thread.java:662) > 2012-10-13 13:05:09,061 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:52573 which had sessionid 0x139ff3656b30003 > 2012-10-13 13:05:09,082 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down > 2012-10-13 13:05:09,085 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for sflow-linux02.santanet.dell.com,47137,1348606516541
|
|