Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBASE-2312 discussion


Copy link to this message
-
RE: HBASE-2312 discussion
Loved the "Juliet" terminology as well :).

@Todd: I agree we will need something like #2 or especially #3 in other places.

Looks like we have a consensus - I will update the JIRA.

Thanks
Karthik
-----Original Message-----
From: Todd Lipcon [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 16, 2010 10:09 PM
To: [EMAIL PROTECTED]
Subject: Re: HBASE-2312 discussion

On Tue, Mar 16, 2010 at 8:59 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 16, 2010 at 5:08 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> >
> > What do you think about the trick of making the RS do a ZK sync before
> any
> > meta op? This forces it to take at most one action after it's been
> > terminated.
> >
>
> ... where meta op is open of new WAL log?
>
> How would this work?  RS would note in ZK the name of the WAL its
> about to open before it did it?  If the RS then does a "Juliet" --
>
[haha, love this terminology!]

> i.e. goes into a GC pause death-like coma -- on revivial, it'll go to
> open the WAL but master will have already done so, and so it'll fail?
>
>
I was actually referring to the explicit sync call in ZK:
http://hadoop.apache.org/zookeeper/docs/r3.2.1/api/org/apache/zookeeper/ZooKeeper.html#sync%28java.lang.String,%20org.apache.zookeeper.AsyncCallback.VoidCallback,%20java.lang.Object%29

The javadoc isn't that clear, but the way I understand this call is that it
makes sure the client's view of the world is up-to-date with respect to the
ZK leader at the beginning of the sync call.

The "note" box at the bottom of this section also explains it pretty well:
http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperProgrammers.html#ch_zkGuarantees

If we insert this between any transitions, I think we can ensure that the
region server will only do at most one operation after losing its lease.
This means that whole "chasing the log" thing is unnecessary.

> @Karthik "I am a little nervous about the master backing off on
> detecting the RS's progress - because the RS has already lost its zk
> lease."
>
> Yes.  The RS will have had its 'shut-yourself-down' flag set on
> loss-of-lease so is on its way out.  Its not going to revive so its
> logs need recovering.
>
> @Kannan "Option #1 seems easy to reason about and simple to implement.
> Can we go ahead with that if there is no major objection?"
>
> Fine by me.
>

Fine by me as well. I think we'll need solutions like 2 or 3 other places,
but for this one #1 seems to work (I'll continue to think if there are any
holes in our logic)

-Todd
--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB