Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - RE: Follow-up to regionservers not being online - more logs included


+
rama krishna 2012-10-19, 15:45
+
Dan Brodsky 2012-10-19, 13:41
Copy link to this message
-
Re: Follow-up to regionservers not being online - more logs included
ramkrishna vasudevan 2012-10-19, 15:53
Can you attach the Master logs also.  Looks that the ROOT region assignment
failed.  This seems to be the first problem.

Regards
Ram

On Fri, Oct 19, 2012 at 7:11 PM, Dan Brodsky <[EMAIL PROTECTED]> wrote:

> I'm still having several issues with my cluster. This used to all
> work, and there have been no recent configuration changes.
>
> To recap, Master and regionservers all appear to start successfully,
> but several regionservers do not show as online on Hbase master status
> page. Moreover, there appear to be a bunch of regions stuck in
> transition that never open. Of the 5 regions currently on the status
> page, only two have a numberOfOnlineRegions > 0.
>
> Log file snippets:
>
> First, the ZooKeeper Dump from off the master status web page shows
> that some of the regionservers have connected to ZK, but they still
> don't show as being online. Note that the IP ending in 217 is the
> Hbase master, the ones ending in 31-40 are RS's 1-10 respectively:
> http://paste.ee/p/JAUfJ
>
> This is the log file for one of the regionservers that did not come
> online, showing not much of anything, I'm afraid:
> http://paste.ee/p/KHgOP
>
> In one of the RegionServers that did come online, I'm seeing this
> error repeat over and over (several of the RS_ZK_REGION_OPENING debug
> statements precede the error): http://paste.ee/p/lbiTN
>
> ZooKeeper log for one of the ZK nodes. Not much remarkable here; the
> nodes connect successfully, and there's a repeat opening/closing of a
> session with the Hbase master (which is also a ZK quorum peer):
> http://paste.ee/p/zjSCO
>
> The master log doesn't show much. A lot this:
>
> CatalogTracker: Failed verification of .META.,,1 at
> address=dn-4,60020,1350563250999;
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region is not
> online: .META.,,1
>
> But then it does find .META. and open it on a different RS:
>
> 2012-10-19 12:59:21,480 INFO
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> OPENED event for .META.,,1.1028785192 from dn-3,60020,1350651496690;
> deleting unassigned node
> 2012-10-19 12:59:21,482 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: The master has
> opened the region .META.,,1.1028785192 that was online on
> dn-3,60020,1350651496690
> 2012-10-19 12:59:21,497 INFO org.apache.hadoop.hbase.master.HMaster:
> .META. assigned=2, rit=false, location=dn-3,60020,1350651496690
>
> The master log file goes on to show that 71 regions come online, which
> is consistent with the master status page.
>
> Thoughts?
>
+
ramkrishna vasudevan 2012-10-19, 16:23