Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> [DISCUSSION] Pig.next


+
Olga Natkovich 2011-03-03, 00:52
+
Dmitriy Ryaboy 2011-03-03, 02:31
+
Santhosh Srinivasan 2011-03-03, 02:44
+
Dmitriy Ryaboy 2011-03-03, 02:57
+
Santhosh Srinivasan 2011-03-03, 02:58
+
Alan Gates 2011-03-03, 18:43
+
Santhosh Srinivasan 2011-03-03, 19:51
Copy link to this message
-
Re: [DISCUSSION] Pig.next
The interfaces that pig have are at different levels of maturity, and most of the interfaces have been marked as stable or evolving to indicate that.
Most of the core interfaces including the language, and udfs belong to the stable category. I think this is sufficient for 1.0. There will always be some new interfaces that will be in evolving category.

The hadoop classes used by the load/store functions probably belong to the 'slowly evolving'  category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 !

Regarding the impact of big changes in 0.8 and 0.9 not having had the time to settle in, I think by the time 1.0/0.10 is ready those changes would have been well tested in all sorts of setups/configurations.

-Thejas

On 3/3/11 11:51 AM, "Santhosh Srinivasan" <[EMAIL PROTECTED]> wrote:

Hilarious.

Getting to the serious points.

What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything.

1. The language syntax
2. The language semantics
3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.)
4. Java APIs (PigServer, etc.)

In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats).

I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility.

Santhosh

-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Thursday, March 03, 2011 10:44 AM
To: [EMAIL PROTECTED]
Subject: Re: [DISCUSSION] Pig.next

I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature.  As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases.  That only delays the question of what we call Pig.next, it does not answer it.

To me, declaring 1.0 would mean the following things:

1) Pig is ready for production use, at least by the brave.
2) It is still rough around the edges, you do not get a smooth product until 2.0 or later.
3) We will not make non-backward compatible changes to interfaces we have declared stable.

Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway.

As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security.  I am sure they will get there, but I may be retired first.  In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea.  Nor do I see it as necessary.  Before we could go 1.0 would we insist that every jar we import is >= 1.0?  Yes we are bound more tightly to Hadoop then we are to log4j.  But we are still our own project.  1.0 is a claim we are making about ourselves, not about the platform we run on.  We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately.

Also the argument that we should not go 1.0 because we are changing a lot of things is bogus.  We are always changing a lot of things.  If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done.  I hope I have retired before we reach that state.

My perspective on what 1.0 means obviously comes from a developer inside the project.  I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users.

Alan.

On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote:

> I am worried that the new optimization plan work has not had a
+
Santhosh Srinivasan 2011-03-04, 01:50
+
Dmitriy Ryaboy 2011-03-04, 01:53
+
Eric Lubow 2011-03-03, 20:03
+
Corbin Hoenes 2011-03-04, 12:45
+
Kaluskar, Sanjay 2011-03-04, 00:53
+
Jai Krishna 2011-03-03, 04:14
+
Mridul Muralidharan 2011-03-04, 13:48
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB