Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Begin a discussion about Pig as a top level project


Copy link to this message
-
RE: Begin a discussion about Pig as a top level project
Santhosh Srinivasan 2010-04-05, 19:22
"Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance."

Bingo!

Santhosh

-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 05, 2010 11:37 AM
To: [EMAIL PROTECTED]
Subject: Re: Begin a discussion about Pig as a top level project

Prognostication is a difficult business.  Of course I'd love it if  
someday there is an ISO Pig Latin committee (with meetings in cool  
exotic places) deciding the official standard for Pig Latin.  But that  
seems like saying in your start up's business plan, "When we reach  
Google's size, then we'll do x".  If there ever is an ISO Pig Latin  
standard it will be years off.

As others have noted, staying tight to Hadoop now has many advantages,  
both in technical and adoption terms.  Hence my advocacy of keeping  
Pig Latin Hadoop agnostic while tightly integrating the backend.  
Which is to say that in my view, Pig is Hadoop specific now, but there  
may come a day when that is no longer true.   Whether Pig will ever  
move past just running on Hadoop to running in other parallel systems  
won't be known for years to come.  Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance.

Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:

> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12  
> months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The  
> syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping  
> constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to  
> become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection  
> v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our  
> influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling  
> is
> by choice. If Pig continues to be a data flow language with clear  
> syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly