Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> replicating to oneself?


Copy link to this message
-
Re: replicating to oneself?
I injected more debug code into ReplicationPeer.

 public ReplicationPeer(Configuration conf, String key,
      String id) throws IOException {
    this.conf = conf;
    this.clusterKey = key;
    this.id = id;
    this.reloadZkWatcher()

    LOG.info("Demai @ReplicationPeer : clusterkey=" + key + ",id=" + id);
    LOG.info("Demai @ReplicationPeer : this.zkw.quom =" +
this.zkw.getQuorum()); *//Quorum is incorrect*
    LOG.info("Demai @ReplicationPeer : this.zkw=" + this.zkw.toString());
  }
and on the problematic cluster, the ReplicationPeer.zkw.quorum is wrong

2013-11-01 12:40:33,351 INFO
org.apache.hadoop.hbase.replication.ReplicationPeer: Demai @ReplicationPeer
: clusterkey=6,id=hdtest014.svl.ibm.com:2181:/hbase
2013-11-01 12:40:33,351 INFO
org.apache.hadoop.hbase.replication.ReplicationPeer: Demai @ReplicationPeer
: this.zkw.quom =*bdvm134.svl.ibm.com:2181*
2013-11-01 12:40:33,351 INFO
org.apache.hadoop.hbase.replication.ReplicationPeer: Demai @ReplicationPeer
: this.zkw=connection to cluster: hdtest014.svl.ibm.com:2181:/hbase

On Fri, Nov 1, 2013 at 11:12 AM, Demai Ni <[EMAIL PROTECTED]> wrote:

> Himanshu and Nick,
>
> many thanks for your help.  I don't have all the answers to Nick's
> questions, since the deployment is built by another team and combined with
> a lot of other components like zookeeper, hadoop, hbase, hive, oozie, etc.
>
> I followed Himanshu's suggestion and checked the hbase.id on two
> different problematic cluster, they are different. So seems normal to me.
> About the deployment. I did clean install(well, at least that is my
> intention), and not re-using existing znodes. The installation is to stop
> everything(zookeeper, hadoop, hbase, etc), remove all the files and data;
> then install everything. so should be nothing left over.
>
> Let me describe current setup and my investigation so far. Rows can be
> replicated from the correct cluster to problematic cluster, but can't be
> replicated from the problematic one EVEN with both have the same hbase.jar.
>
> ** Problematic Cluster: *
> name = bdvm134
> /hbase/hbase.id =  $b13a0e3a-2bec-4e13-8b1d-043aa1a66443
> > list_peers  (I put two there just for debug purpose)
>  PEER_ID CLUSTER_KEY STATE
>  6 hdtest014.svl.ibm.com:2181:/hbase ENABLED
>  7 hdtest014.svl.ibm.com:2181:/hbase ENABLED
>
>
> ** Correct Cluster: *
> name = hdtest014
> /hbase/hbase.id = ce41a00d-5b0c-44b2-8bf7-bfd35bda1d42
> > list_peers
>  PEER_ID CLUSTER_KEY STATE
>  1 bdvm134.svl.ibm.com:2181:/hbase ENABLED
>
>
> I injected some debugging code into ReplicationSource.run()
> public void run() {
>   ....
>
>     LOG.info("Replicating "+clusterId + " -> " + peerClusterId);
>
>     Map<String, ReplicationPeer> peerList = zkHelper.getPeerClusters();
>
>     for (Map.Entry<String, ReplicationPeer> peer : peerList.entrySet()) {
>       LOG.info("Demai ---------------begin");
>       String peerId_A = peer.getKey();
>       ReplicationPeer rPeer = peer.getValue();
>       try {
>         LOG.info("clusterUUId = " + zkHelper.getUUIDForCluster(
> zkHelper.getZookeeperWatcher()));
>         LOG.info("peerUUID = " + zkHelper.getPeerUUID(peerId_A));
>       } catch (KeeperException e) {
>         LOG.info("exception = " + e);
>       }
>
>       LOG.info("peerID = " + peerId_A);
>       LOG.info("peer Value=" + rPeer.toString());
>
>       List<ServerName> sList = zkHelper.getSlavesAddresses(peerId_A);
>       for (ServerName sName : sList) {
>         LOG.info("sName = " + sName.getHostname()); *// value incorrect
> on problematic cluster*
>       }
>       LOG.info("Peer Cluster=" + rPeer.getClusterKey() + ",Peer ID = " +
> rPeer.getId());
>       LOG.info("Demai ---------------end");
>     }
> ...
> }
>
>
>
> on bdvm134- regionserver:
> 2013-11-01 10:20:44,757 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
> log for replication bdvm134.svl.ibm.com%2C60020%2C1383324585548.1383324589592
> at 3073
> 2013-11-01 10:20:44,761 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: