Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Towards Hadoop 1.0:  Stronger API Compatibility from 0.21 onwards

Copy link to this message
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
HBase and similar HDFS clients could benefit from a (high) performant
stable datacenter network protocol that is built into the namenode and
datanodes. Then we could decouple from Hadoop versioning and release
cycle. HDFS could decouple from core, etc.

Whatever stable network protocol is devised, if any, of course should
perform as well if not better than the current one. A stable but lower
performing option, unfortunately, would be excluded from consideration
right away.

HBase is a bit of a special case currently perhaps in that its access
pattern is random read/write and it may be only a handful of clients
like that. However if HDFS is positioned as a product in its own right,
which I believe is the case since the split, there may be many other
potential users of it -- for all of its benefits -- given a stable
wire format that enables decoupled development.

API compatibility  +1
Data compatibility +1
Wire compatibility +1

Best regards,

Andrew Purtell
Committing Member, HBase Project: hbase.org

From: Steve Loughran <[EMAIL PROTECTED]>
Sent: Monday, September 28, 2009 3:15:09 AM
Subject: Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Dhruba Borthakur wrote:
> It is really nice to have wire-compatibility between clients and servers
> running different versions of hadoop. The reason we would like this is
> because we can allow the same client (Hive, etc) submit jobs to two
> different clusters running different versions of hadoop. But I am not stuck
> up on the name of the release that supports wire-compatibility, it can be
> either 1.0  or something later than that.
> API compatibility  +1
> Data compatibility +1
> Job Q compatibility -1Wire compatibility +0
That's stability of the job submission network protocol you are looking for there.
* We need a job submission API that is designed to work over long-haul links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot

Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop