Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region server crashes when using replication


Copy link to this message
-
Re: Region server crashes when using replication
Thanks, J-D, that managed to solve a part of the problem. The servers
have stopped crashing and the master now properly detects when a RS
goes down, by the way, since the RS does detect this it may be a good
idea to stop the server on this event which is a significant
configuration issue.
However now the replication just doesn't see to work. I didn't change
anything in the configuration which already managed to push 2 rows
before crashing yesterday. I still see the peer properly configured in
ZK, the replication is enabled but nothing is happening. All I see in
the log of the RS which holds the table I'm writing into is:

2011-03-25 15:16:56,504 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
log to process, sleeping 1000 times 10
2011-03-25 15:17:07,509 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
log to process, sleeping 1000 times 10
2011-03-25 15:17:18,515 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
log to process, sleeping 1000 times 10
2011-03-25 15:17:29,520 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
log to process, sleeping 1000 times 10
2011-03-25 15:17:40,526 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
log to process, sleeping 1000 times 10

Needless to say nothing get's into the peer cluster.

-eran
On Fri, Mar 25, 2011 at 02:02, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> Ok so this is the same old DNS issue...
>
> This is the important message in the log:
>
> Master passed us address to use. Was=hadoop1-s02:60020,
> Now=hadoop1-s02.farm-ny.not-a-spammer.com:60020
>
> This means that when the RS tries to resolve itself it gets its
> hostname, but when the master resolves the RS it gets the FQDN. This
> is a bug in HBase that we rely on those strings as "true machine
> identification" but that's how it is at the moment. It happens to be
> that replication is setup later in the process so it uses the FQDN.
> The only way you can fix it is to change your DNS settings. Here we
> resolve everything with their hostnames.
>
> Hope that helps and sorry about all the trouble,
>
> J-D
>
>>> You make it sound like it's a bad thing :)
>>> But seriously, SpamAssassin is really not the brightest anti spam software on the plant. You should check out what we're doing, we're actually in the same field as you guys, except our product is B2B.
>>>
>>> Thanks for looking into the bug.
>>>
>>> -eran
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB