Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> [Proposal] Pluggable Namespace

Copy link to this message
Re: [Proposal] Pluggable Namespace

On Oct 3, 2013, at 12:17 PM, Milind Bhandarkar wrote:

> Exec Summary: For the last couple of months, we, at Pivotal, along with a
> couple of folks in the community have been working on making Namespace
> implementation in the namenode pluggable. We have demonstrated that it can
> be done without major surgery on the namenode, and does not have noticeable
> performance impact. We would like to contribute it back to Apache if there
> is sufficient interest. Please let us know if you are interested, and we
> will create a Jira and update the patch for in-progress work.
> ……
a reasonable idea - but best to discuss actual details in a jira.  Some initial thoughts, to clear some of the confusions, (and accusations) in this thread

HDFS pluggability (and relation to pluggability added as part of Federation)
 - Pluggabilty and federation are orthogonal, although we did improved the pluggabily of HDFS as part of federation implementation. As Vinod has noted the *block layer* was separated out as part of the federation work and hence makes the general development of new  of HDFS namespace implementations easier.  Federation's  pluggablity was  targeted towards  someone writing a new NN and reusing the block storage layer via a library   and optionally living side-by-side with different implementations of the NN within the same cluster. Hence we added notion of block pools and separated out the block management layer.  
 - So your proposed work is clearly not in conflict with Federation or even with the pluggability that Federation added, but philosophically,  your proposal is complementary.

Considerations: A Public API?
The FileSystem/AbstractFileSystem APIs and the newly proposed AbstractFSNamesystem are targeting very different kinds of plugability into Hadoop. The former takes a thin application API (FileSystem and FileContext) and makes it easy for users to plug in different filesytems (S3, LocalFS, etc) as Hadoop compatible filesystems. In contrast the later (the proposed AbstractFSNamesystem) is a fatter interface inside the depths of HDFS implementation and makes parts of the impl pluggable.

I would  not make your proposed AbstractFSNamesystem a public stable Hadoop API but instead direct it towards to HDFS developers who want to extend the implementation of HDFS more easily. Were you envisioning the Abstract FSNamesystem to be a stable public Hadoop API? If someone has their own private implementation for this new abstract class, would  the HDFS community have the freedom to modify the abstract class in incompatible ways? These are discussions for the Jira.

A somewhat related piece of work:
Since Milind motivated his pluggbility by  a new NN implementation (that happens to use HBase), I will briefly mention an experiment for building a new NN that stores only a partial namespace in memory. The goal of this experiment was *not* making the NN code more pluggable, but instead to provide an alternate implementation of the NN; hence it is orthogonal.  A PhD student, who worked as an intern at Hortonworks implemented a NN that stores only partial namespace in RAM. She presented this to a HUG in Aug 2013 in sunnyvale. I have encouraged her to file a jira but she wants to finish some more experiments before filing, I will file a jira on her behalf and refer to her work in the next day or so.  It is a prototype that helps us understand how well the particular implementation choice for this alternate NN  works. It would be interesting to see if her code changes fit into Milind's newly proposed AbstractFSNamesystem. My initial view is that it may not, but I will wait till Milind posts an initial strawman of the AbstractFSNamesystem before commenting (While subclassing interfaces can works very well, subclassing implementations can be very tricky to get right.).

Milind, please file the jira for further discussions.


NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB