Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Mounting a remote Zookeeper


+
Alexander Shraer 2011-06-09, 18:40
Copy link to this message
-
RE: Mounting a remote Zookeeper
Alexander Shraer 2011-06-10, 00:10
This is a preliminary proposal, so everything is still open. Still, I think there are many advantages over the previous namespace partitioning proposal (http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper) that wasn't implemented AFAIK. The idea here is to make much smaller and more intuitive changes.
For example, the previous proposal did not offer any ordering guarantees across partitions. Also - in Linux mount you don't need to specify for each new file which mount point the file belongs to - we can exploit the tree structure to infer that instead of creating and maintaining an additional hierarchy like in the previous proposal.

> what happens when a client does a read on the remote ZK cluster. does the read always get
> forwarded to the remote cluster?

No. The idea is to identify when inter-cluster communication is necessary to maintain sequential consistency and otherwise avoid it. In the twiki we propose such a possible rule. For example, if you read from a remote partition that didn't mount any part of your local namespace, it's ok to return an old value. In any case, the read is never forwarded to the remote cluster - even if inter-cluster communication is necessary, we sync the observer with the remote leader and then read from the observer.

> in your proposal, what happens if an a client creates an ephemeral
> node on the remote ZK cluster. who does the failure detection and clean up?

You're right, we should definitely address that in the twiki. I think that in any case a cluster should only monitor the clients connected to that cluster and not clients connected to remote clusters. So if we support creating remote ephemeral nodes I think failure detection should be done locally and the remote cluster should subscribe to relevant local failure events and be notified.

> what happens if the request to the remote cluster hangs?

A user can determine what happens in this case. If he wants all his following requests to fail, a remote request will block all his following requests. Otherwise a remote request can fail and still his following local requests can succeed.

Thanks,
Alex

> -----Original Message-----
> From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 09, 2011 4:05 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Mounting a remote Zookeeper
>
> this is a small nit, but i think the partition proposal works a bit
> more like a mount point than your proposal. when you mount a file
> system, the mount isn't transparent. two mounted file systems can have
> files with the same inode number, for example. you also can't do some
> things like a rename across file system boundaries.
>
> in your proposal, what happens if an a client creates an ephemeral
> node on the remote ZK cluster. who does the failure detection and
> clean up? it also wasn't clear what happens when a client does a read
> on the remote ZK cluster. does the read always get forwarded to the
> remote cluster? also what happens if the request to the remote cluster
> hangs?
>
> thanx
> ben
>
> On Thu, Jun 9, 2011 at 11:41 AM, Alexander Shraer <shralex@yahoo-
> inc.com> wrote:
> > Hi,
> >
> > We're considering working on a new feature that will allow "mounting"
> part of the namespace of one ZK cluster into another ZK cluster. The
> goal is essentially to be able to partition a ZK namespace while
> preserving current ZK semantics as much as possible.
> > More details are here:
> http://wiki.apache.org/hadoop/ZooKeeper/MountRemoteZookeeper
> >
> > It would be great to get your feedback and especially please let us
> know if you think your application can benefit from this feature.
> >
> > Thanks,
> > Alex Shraer and Eddie Bortnikov
> >
> >
> >
+
Benjamin Reed 2011-06-10, 01:24