|
|
-
Re: Follow-up to regionservers not being online - more logs includedramkrishna vasudevan 2012-10-19, 15:53
Can you attach the Master logs also. Looks that the ROOT region assignment
failed. This seems to be the first problem. Regards Ram On Fri, Oct 19, 2012 at 7:11 PM, Dan Brodsky <[EMAIL PROTECTED]> wrote: > I'm still having several issues with my cluster. This used to all > work, and there have been no recent configuration changes. > > To recap, Master and regionservers all appear to start successfully, > but several regionservers do not show as online on Hbase master status > page. Moreover, there appear to be a bunch of regions stuck in > transition that never open. Of the 5 regions currently on the status > page, only two have a numberOfOnlineRegions > 0. > > Log file snippets: > > First, the ZooKeeper Dump from off the master status web page shows > that some of the regionservers have connected to ZK, but they still > don't show as being online. Note that the IP ending in 217 is the > Hbase master, the ones ending in 31-40 are RS's 1-10 respectively: > http://paste.ee/p/JAUfJ > > This is the log file for one of the regionservers that did not come > online, showing not much of anything, I'm afraid: > http://paste.ee/p/KHgOP > > In one of the RegionServers that did come online, I'm seeing this > error repeat over and over (several of the RS_ZK_REGION_OPENING debug > statements precede the error): http://paste.ee/p/lbiTN > > ZooKeeper log for one of the ZK nodes. Not much remarkable here; the > nodes connect successfully, and there's a repeat opening/closing of a > session with the Hbase master (which is also a ZK quorum peer): > http://paste.ee/p/zjSCO > > The master log doesn't show much. A lot this: > > CatalogTracker: Failed verification of .META.,,1 at > address=dn-4,60020,1350563250999; > org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: Region is not > online: .META.,,1 > > But then it does find .META. and open it on a different RS: > > 2012-10-19 12:59:21,480 INFO > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling > OPENED event for .META.,,1.1028785192 from dn-3,60020,1350651496690; > deleting unassigned node > 2012-10-19 12:59:21,482 INFO > org.apache.hadoop.hbase.master.AssignmentManager: The master has > opened the region .META.,,1.1028785192 that was online on > dn-3,60020,1350651496690 > 2012-10-19 12:59:21,497 INFO org.apache.hadoop.hbase.master.HMaster: > .META. assigned=2, rit=false, location=dn-3,60020,1350651496690 > > The master log file goes on to show that 71 regions come online, which > is consistent with the master status page. > > Thoughts? > |