Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Begin a discussion about Pig as a top level project


Copy link to this message
-
RE: Begin a discussion about Pig as a top level project
I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12 months, I
see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The syntax
is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping constructs
and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection v/s
physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our influence on
the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling is
by choice. If Pig continues to be a data flow language with clear syntax
and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we have
a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain our existing ties to Hadoop and make Pig into a data flow
language for Hadoop.

Santhosh

-----Original Message-----
From: Thejas Nair [mailto:[EMAIL PROTECTED]]
Sent: Friday, April 02, 2010 4:08 PM
To: [EMAIL PROTECTED]; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as
a sub-project of hadoop.

-Thejas

On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:

> Over time, Pig is increasing its coupling to Hadoop (for good
> reasons), rather than decreasing it. If and when Pig becomes a viable
> entity without hadoop around, it might make sense as a TLP. As is, I
> think becoming a TLP will only introduce unnecessary administrative
and bureaucratic headaches.
> So my vote is also -1.
>
> -Dmitriy
>
>
>
> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <[EMAIL PROTECTED]>
wrote:
>
>> So far I haven't seen any feedback on this.  Apache has asked the
>> Hadoop PMC to submit input in April on whether some subprojects
your voice heard.
projects more aware of us.
Hadoop.
on search statistics.
Hadoop's website.
user base and usability of Hadoop.
a community have.

etc.