Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region server crashes when using replication


Copy link to this message
-
Re: Region server crashes when using replication
Had more time to look into it and verify that indeed data is not
replicated because the server doesn't see it in the log. So I tried
restarting the RS and sure enough when the table (which has only one
region) transitioned to another RS the replication started working
(for new data only).
So I tried with another table, and same thing, replication doesn't
work and the logs says "No log to process" but after restarting the RS
and a table transition the replication started working for that table
too. Is there something that gets initialized during a transition that
could be missing before?

-eran

On Fri, Mar 25, 2011 at 21:26, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> Thanks, J-D, that managed to solve a part of the problem. The servers
> have stopped crashing and the master now properly detects when a RS
> goes down, by the way, since the RS does detect this it may be a good
> idea to stop the server on this event which is a significant
> configuration issue.
> However now the replication just doesn't see to work. I didn't change
> anything in the configuration which already managed to push 2 rows
> before crashing yesterday. I still see the peer properly configured in
> ZK, the replication is enabled but nothing is happening. All I see in
> the log of the RS which holds the table I'm writing into is:
>
> 2011-03-25 15:16:56,504 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
> log to process, sleeping 1000 times 10
> 2011-03-25 15:17:07,509 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
> log to process, sleeping 1000 times 10
> 2011-03-25 15:17:18,515 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
> log to process, sleeping 1000 times 10
> 2011-03-25 15:17:29,520 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
> log to process, sleeping 1000 times 10
> 2011-03-25 15:17:40,526 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: No
> log to process, sleeping 1000 times 10
>
> Needless to say nothing get's into the peer cluster.
>
> -eran
>
>
>
>
> On Fri, Mar 25, 2011 at 02:02, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
> > Ok so this is the same old DNS issue...
> >
> > This is the important message in the log:
> >
> > Master passed us address to use. Was=hadoop1-s02:60020,
> > Now=hadoop1-s02.farm-ny.not-a-spammer.com:60020
> >
> > This means that when the RS tries to resolve itself it gets its
> > hostname, but when the master resolves the RS it gets the FQDN. This
> > is a bug in HBase that we rely on those strings as "true machine
> > identification" but that's how it is at the moment. It happens to be
> > that replication is setup later in the process so it uses the FQDN.
> > The only way you can fix it is to change your DNS settings. Here we
> > resolve everything with their hostnames.
> >
> > Hope that helps and sorry about all the trouble,
> >
> > J-D
> >
> >>> You make it sound like it's a bad thing :)
> >>> But seriously, SpamAssassin is really not the brightest anti spam software on the plant. You should check out what we're doing, we're actually in the same field as you guys, except our product is B2B.
> >>>
> >>> Thanks for looking into the bug.
> >>>
> >>> -eran
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB