Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - HBASE-2312 discussion


+
Karthik Ranganathan 2010-03-16, 18:13
+
tsuna 2010-03-16, 19:04
+
Ryan Rawson 2010-03-16, 20:17
+
Stack 2010-03-16, 20:49
+
Todd Lipcon 2010-03-16, 22:17
+
Karthik Ranganathan 2010-03-17, 01:04
+
Kannan Muthukkaruppan 2010-03-17, 01:07
+
Todd Lipcon 2010-03-17, 01:08
+
Stack 2010-03-17, 03:59
+
Todd Lipcon 2010-03-17, 05:08
Copy link to this message
-
RE: HBASE-2312 discussion
Karthik Ranganathan 2010-03-17, 17:21
Loved the "Juliet" terminology as well :).

@Todd: I agree we will need something like #2 or especially #3 in other places.

Looks like we have a consensus - I will update the JIRA.

Thanks
Karthik
-----Original Message-----
From: Todd Lipcon [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 16, 2010 10:09 PM
To: [EMAIL PROTECTED]
Subject: Re: HBASE-2312 discussion

On Tue, Mar 16, 2010 at 8:59 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 16, 2010 at 5:08 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> >
> > What do you think about the trick of making the RS do a ZK sync before
> any
> > meta op? This forces it to take at most one action after it's been
> > terminated.
> >
>
> ... where meta op is open of new WAL log?
>
> How would this work?  RS would note in ZK the name of the WAL its
> about to open before it did it?  If the RS then does a "Juliet" --
>
[haha, love this terminology!]

> i.e. goes into a GC pause death-like coma -- on revivial, it'll go to
> open the WAL but master will have already done so, and so it'll fail?
>
>
I was actually referring to the explicit sync call in ZK:
http://hadoop.apache.org/zookeeper/docs/r3.2.1/api/org/apache/zookeeper/ZooKeeper.html#sync%28java.lang.String,%20org.apache.zookeeper.AsyncCallback.VoidCallback,%20java.lang.Object%29

The javadoc isn't that clear, but the way I understand this call is that it
makes sure the client's view of the world is up-to-date with respect to the
ZK leader at the beginning of the sync call.

The "note" box at the bottom of this section also explains it pretty well:
http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperProgrammers.html#ch_zkGuarantees

If we insert this between any transitions, I think we can ensure that the
region server will only do at most one operation after losing its lease.
This means that whole "chasing the log" thing is unnecessary.

> @Karthik "I am a little nervous about the master backing off on
> detecting the RS's progress - because the RS has already lost its zk
> lease."
>
> Yes.  The RS will have had its 'shut-yourself-down' flag set on
> loss-of-lease so is on its way out.  Its not going to revive so its
> logs need recovering.
>
> @Kannan "Option #1 seems easy to reason about and simple to implement.
> Can we go ahead with that if there is no major objection?"
>
> Fine by me.
>

Fine by me as well. I think we'll need solutions like 2 or 3 other places,
but for this one #1 seems to work (I'll continue to think if there are any
holes in our logic)

-Todd
--
Todd Lipcon
Software Engineer, Cloudera
+
Ryan Rawson 2010-03-17, 17:48
+
Todd Lipcon 2010-03-17, 17:55
+
Ryan Rawson 2010-03-17, 18:38
+
Todd Lipcon 2010-03-17, 21:59
+
Todd Lipcon 2010-03-17, 23:29
+
Dhruba Borthakur 2010-03-17, 04:39