Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Begin a discussion about Pig as a top level project

Copy link to this message
Re: Begin a discussion about Pig as a top level project
I concur with Santhosh here. I think main question we need to answer
here is how close our ties are with Hadoop currently and how it will
be in future ? When Pig was originally designed the intent was to keep
it backend neutral, so  much so that there was a reference backend
implementation (also known as local engine) which had nothing to do
with Hadoop. But things have changed since then. Hadoop's local mode
is adopted in favor of Pig's own local mode. We have moved from being
backend agnostic to hadoop favoring. And while this was happening, it
seems we tried to keep Pig Latin language independent of hadoop
backend  while Pig runtime started to make use of hadoop concepts.

Apart from design decisions, this move also has a practical impact on
our codebase. Since we adopted Hadoop more closely, we got rid of an
extra layer of abstraction and instead started using similar
abstractions already existing in Hadoop. This has a positive impact
that it simplified the codebase and provides tighter integration with
So, if we are continuing in a direction where Hadoop is our only
backend (or atleast a favored one), close ties to Hadoop are useful
because of the reasons Alan and Dmitriy pointed out. if not, then I
think moving out to TLP makes sense. Since, there is no efforts which
I am aware of, is trying to plug in a different backend for Pig, I
think maintaining close ties with Hadoop is useful for Pig. In future
when there is a different distributed computing platform comes up
which we want to use as backend, we can revisit our decision. So, as
for things stand today I am -1 to move out of  Hadoop.

And I would also like to reiterate my point that though Pig runtime
may continue to get closer to Hadoop, we shall keep Pig Latin
completely backend agnostic.


On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <[EMAIL PROTECTED]> wrote:
> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12 months, I
> see the following:
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
> Here is my take on answering the aforementioned questions.
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
> Like a good lawyer, I also have rebuttals to Alan's questions :)
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our influence on
> the Hadoop community (think Logical, Physical and MR Layers :)