Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Replication hosed after simple cluster restart


+
lars hofhansl 2013-03-14, 01:12
Copy link to this message
-
Re: Replication hosed after simple cluster restart
lars hofhansl 2013-03-14, 01:22
I suppose the problem could be in zkHelper.copyQueuesFromRSUsingMulti(rsZnode) as called from ReplicationSourceManager.NodeFailoverWorker.run().
copyQueuesFromRSUsingMulti will return the queues it read even when the multi operation failed (because another RS managed to execute it first).

-- Lars

________________________________
 From: lars hofhansl <[EMAIL PROTECTED]>
To: hbase-dev <[EMAIL PROTECTED]>
Sent: Wednesday, March 13, 2013 6:12 PM
Subject: Replication hosed after simple cluster restart
 
We just ran into an interesting scenario. We restarted a cluster that was setup as a replication source.
The stop went cleanly.

Upon restart *all* regionservers aborted within a few seconds with variations of these errors:
http://pastebin.com/3iQVuBqS

This is scary!

-- Lars
+
Stack 2013-03-14, 01:43
+
lars hofhansl 2013-03-14, 01:52
+
Ted Yu 2013-03-14, 01:17
+
lars hofhansl 2013-03-14, 01:27
+
Himanshu Vashishtha 2013-03-14, 01:38
+
lars hofhansl 2013-03-14, 01:45
+
Ted Yu 2013-03-14, 03:06
+
Andrew Purtell 2013-03-14, 03:22
+
Himanshu Vashishtha 2013-03-14, 03:26
+
lars hofhansl 2013-03-14, 03:48
+
Himanshu Vashishtha 2013-03-14, 03:59
+
lars hofhansl 2013-03-14, 04:16
+
Jean-Marc Spaggiari 2013-03-14, 12:39
+
Himanshu Vashishtha 2013-03-14, 03:13
+
Himanshu Vashishtha 2013-03-14, 01:51
+
Himanshu Vashishtha 2013-03-14, 01:52