Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Cannot locate root region


Copy link to this message
-
RE: Cannot locate root region
Karthik Ranganathan 2010-01-29, 18:44
The master does another lookup independent of the region server using the hostname given by the region server:

ServerManager.java, regionServerReport() does:
    HServerInfo storedInfo = serversToServerInfo.get(info.getServerName()); // info.getServerName() is hostname

Which eventually does:
HServerAddress.getHostname()

HServerAddress' constructor creates the InetSocketAddress from the hostname:port, which involves mapping the hostname to the ip address using a lookup.

Thanks
Karthik
-----Original Message-----
From: Joydeep Sarma [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 29, 2010 9:46 AM
To: [EMAIL PROTECTED]
Subject: Re: Cannot locate root region

@Kannan - Karthik's mail said the reverse lookup happens in the RS
(not the master). the master simply tried to match the offered
hostname.

i dont know whose reading is right - but if it's the RS - i didn't
understand why that wasn't just the reverse lookup done once at
bootstrap time (which wouldn't be affected by ongoing DNS badness).
On Fri, Jan 29, 2010 at 9:39 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> I just created https://issues.apache.org/jira/browse/HBASE-2174
>
> We handle addresses in different ways depending on which part of the
> code you're in. We should correct that everywhere by implementing a
> solution that also solves what you guys are seeing.
>
> J-D
>
> On Fri, Jan 29, 2010 at 8:33 AM, Kannan Muthukkaruppan
> <[EMAIL PROTECTED]> wrote:
>> @Joy: The info stored in .META. for various regions as well as in the ephemeral nodes for region servers in zookeeper are both already IP address based. So doesn't look like multi-homing and/or the other flexibilities you mention were a design goal as far as I can tell.
>>
>> Regarding: <<< doesn't the reverse ip lookup just once at RS startup time?>>>, what seems to be happening is this:
>>
>> A regionServer periodically sends a regionServerReport (RPC call) to the master. A HServerInfo argument is passed as an argument and it identifies the sending region server's identity in IP address format.
>>
>> The master, in ServerManager class, maintains a serversToServerInfo map which is hostname based. Every time a master receives a regionServerReport it converts the IP address based name to a hostname via the info.getServerName() call. Normally this call returns the hostname, but we suspect that during the DNS flakiness, it returned an IP address based string. And so, this caused ServerManager.java to think that it was hearing from a new server. And this lead to:
>>
>>  HServerInfo storedInfo = serversToServerInfo.get(info.getServerName());
>>    if (storedInfo == null) {
>>      if (LOG.isDebugEnabled()) {
>>        LOG.debug("Received report from unknown server -- telling it " +   <<===========>>          "to " + CALL_SERVER_STARTUP + ": " + info.getServerName());  <<===========>>      }
>>
>> and bad things down the road.
>>
>> The above error message in our logs (example below) indeed identified the host in IP address syntax, even though normally the getServerName call would return the info in hostname format.
>>
>> 2010-01-28 11:21:34,539 DEBUG org.apache.hadoop.hbase.master.ServerManager: Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP: 10.129.68.203,60020,1263605543210
>>
>> This affected three of our test clusters at the same time!
>>
>> Perhaps all we need to do is to change the ServerManager's internal maps to all be IP based? That way we avoid/bypass the master having to look up the hostname on every heartbeat.
>>
>> regards,
>> Kannan
>> ________________________________________
>> From: Joydeep Sarma [[EMAIL PROTECTED]]
>> Sent: Friday, January 29, 2010 1:20 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Cannot locate root region
>>
>> hadoop also uses the hostnames. if a host is multi-homed - it's
>> hostname is a better identifier (which still allows it to use
>> different nics/ips for actual traffic). it can help in the case the
>> cluster is migrated for example (all the ips change). one could have