Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Master aborts on start up - URGENT


Copy link to this message
-
RE: Master aborts on start up - URGENT
Nope. this seems to be very serious issue

When I tried to recreate 'usertable' I got the same issue again:
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for a386becc8860c810e33bb9c9d81482bc with OFFLINE state
2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. destination server is sjc1-eng-perf-g1-grid04.carrieriq.com,60020,1374969681440
2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated a random one; hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=, dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 (online=20, available=19) available servers
2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. state=CLOSED, ts=1374971740713, server=sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
Master aborted.

This is what I ran:

create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', BLOCKCACHE => true}, { SPLITS => ['user', 'user05', 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95' ]}

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:08 PM
To: [EMAIL PROTECTED]
Subject: RE: Master aborts on start up - URGENT

OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and then I re-ran the tool
and successfully repaired META.

table and
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: [EMAIL PROTECTED]
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912
It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.

2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=137496653