Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Master aborts on start up - URGENT


Copy link to this message
-
RE: Master aborts on start up - URGENT
OK, that was my issue.

All RS failed to create table because we do not have SNAPPY support.

RS fail to create table, but Master should not abort in this case.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:47 PM
To: [EMAIL PROTECTED]
Subject: RE: Master aborts on start up - URGENT

Nope. this seems to be very serious issue

When I tried to recreate 'usertable' I got the same issue again:
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for a386becc8860c810e33bb9c9d81482bc with OFFLINE state
2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. destination server is sjc1-eng-perf-g1-grid04.carrieriq.com,60020,1374969681440
2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated a random one; hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=, dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 (online=20, available=19) available servers
2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. state=CLOSED, ts=1374971740713, server=sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
Master aborted.

This is what I ran:

create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', BLOCKCACHE => true}, { SPLITS => ['user', 'user05', 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95' ]}

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:08 PM
To: [EMAIL PROTECTED]
Subject: RE: Master aborts on start up - URGENT

OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and then I re-ran the tool
and successfully repaired META.

table and
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: [EMAIL PROTECTED]
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912
It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.

2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0c
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB