-Re: Follow-up to regionservers not being online - more logs included
Can you try like this. Just stop your cluster. Start one Master and
RegionServer. May be for temporarily just have one ZK to which the master
and RS is able to connect.
After this is done, just see on the master UI whether the ROOT and META
table got assigned? means you can see in the MASTER UI the name of these
two tables and the name of RS which you have started.
On Fri, Oct 19, 2012 at 9:23 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:
> Can you attach the Master logs also. Looks that the ROOT region
> assignment failed. This seems to be the first problem.
> On Fri, Oct 19, 2012 at 7:11 PM, Dan Brodsky <[EMAIL PROTECTED]> wrote:
>> I'm still having several issues with my cluster. This used to all
>> work, and there have been no recent configuration changes.
>> To recap, Master and regionservers all appear to start successfully,
>> but several regionservers do not show as online on Hbase master status
>> page. Moreover, there appear to be a bunch of regions stuck in
>> transition that never open. Of the 5 regions currently on the status
>> page, only two have a numberOfOnlineRegions > 0.
>> Log file snippets:
>> First, the ZooKeeper Dump from off the master status web page shows
>> that some of the regionservers have connected to ZK, but they still
>> don't show as being online. Note that the IP ending in 217 is the
>> Hbase master, the ones ending in 31-40 are RS's 1-10 respectively:
>> This is the log file for one of the regionservers that did not come
>> online, showing not much of anything, I'm afraid:
>> In one of the RegionServers that did come online, I'm seeing this
>> error repeat over and over (several of the RS_ZK_REGION_OPENING debug
>> statements precede the error): http://paste.ee/p/lbiTN
>> ZooKeeper log for one of the ZK nodes. Not much remarkable here; the
>> nodes connect successfully, and there's a repeat opening/closing of a
>> session with the Hbase master (which is also a ZK quorum peer):
>> The master log doesn't show much. A lot this:
>> CatalogTracker: Failed verification of .META.,,1 at
>> org.apache.hadoop.hbase.NotServingRegionException: Region is not
>> online: .META.,,1
>> But then it does find .META. and open it on a different RS:
>> 2012-10-19 12:59:21,480 INFO
>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
>> OPENED event for .META.,,1.1028785192 from dn-3,60020,1350651496690;
>> deleting unassigned node
>> 2012-10-19 12:59:21,482 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: The master has
>> opened the region .META.,,1.1028785192 that was online on
>> 2012-10-19 12:59:21,497 INFO org.apache.hadoop.hbase.master.HMaster:
>> .META. assigned=2, rit=false, location=dn-3,60020,1350651496690
>> The master log file goes on to show that 71 regions come online, which
>> is consistent with the master status page.