Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [VOTE] Direction for Hadoop development


Copy link to this message
-
Re: [VOTE] Direction for Hadoop development
This sounds like an important issue. But I personally don't understand what
exactly the controversy is, and therefore what is this vote about, and what
are the choices, if any.
What I understand is that the issue spans over at least two (long) issues
and different discussion threads. Could somebody knowledgeable make an
independent digest of what is going on and how it stands. I am probably not
alone who struggles with this.
Is there a simple answer, like
"you vetoed yesterday - now it's my turn" or
"Avro should/not hold the monopoly for Hadoop serialization"?
These were just humorous examples.

Thanks,
--Konstantin

On Mon, Nov 29, 2010 at 2:30 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

> All,
>   Based on the discussion on HADOOP-6685, there is a pretty fundamental
> difference of opinion about how Hadoop should evolve. We need to figure out
> how the majority of the PMC wants the project to evolve to understand which
> patches move us forward. Please vote whether you approve of the following
> direction. Clearly as the author, I'm +1.
>
> -- Owen
>
> Hadoop has always included library code so that users had a strong
> foundation to build their applications on without needing to continually
> reinvent the wheel. This combination of framework and powerful library code
> is a common pattern for successful projects, such as Java, Lucene, etc.
> Toward that end, we need to continue to extend the Hadoop library code and
> actively maintain it as the framework evolves. Continuing support for
> SequenceFile and TFile, which are both widely used is mandatory. The
> opposite pattern of implementing the framework and letting each distribution
> add the required libraries will lead to increased community fragmentation
> and vendor lock in.
>
> Hadoop's generic serialization framework had a lot of promise when it was
> introduced, but has been hampered by a lack of plugins other than Writables
> and Java serialization. Supporting a wide range of serializations natively
> in Hadoop will give the users new capabilities. Currently, to support Avro
> or ProtoBuf objects mutually incompatible third party solutions are
> required. It benefits Hadoop to support them with a common framework that
> will support all of them. In particular, having easy, out of the box support
> for Thrift, ProtoBufs, Avro, and our legacy serializations is a desired
> state.
>
> As a distributed system, there are many instances where Hadoop needs to
> serialize data. Many of those applications need a lightweight, versioned
> serialization framework like ProtocolBuffers or Thrift and using them is
> appropriate. Adding dependences on Thrift and ProtocolBuffers to the
> previous dependence on Avro is acceptable.