Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> requirements for Pig 1.0?


Copy link to this message
-
RE: requirements for Pig 1.0?
To add to Alan's list:

1. Ability to handle unknown types in Pig's schema model.
2. Load/Store interfaces are not set in stone.
3. Nice to have: Make PigServer thread safe.

Thanks,
Santhosh

-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 23, 2009 1:40 PM
To: [EMAIL PROTECTED]
Subject: Re: requirements for Pig 1.0?

I don't believe there's a solid list of want to haves for 1.0.  The  
big issue I see is that there are too many interfaces that are still  
shifting, such as:

1) Data input/output formats.  The way we do slicing (that is, user  
provided InputFormats) and the equivalent outputs aren't yet solid.  
They are still too tied to load and store functions.  We need to break  
those out and understand how they will be expressed in the language.  
Related to this is the semantics of how Pig interacts with non-file  
based inputs and outputs.  We have a suggestion of moving to URLs, but  
we haven't finished test driving this to see if it will really be what  
we want.

2) The memory model.  While technically the choices we make on how to  
represent things in memory are internal, the reality is that these  
changes may affect the way we read and write tuples and bags, which in  
turn may affect our load, store, eval, and filter functions.

3) SQL.  We're working on introducing SQL soon, and it will take it a  
few releases to be fully baked.

4) Much better error messages.  In 0.2 our error messages made a leap  
forward, but before we can claim to be 1.0 I think they need to make 2  
more leaps:  1) they need to be written in a way end users can  
understand them instead of in a way engineers can understand them,  
including having sufficient error documentation with suggested courses  
of action, etc.; 2) they need to be much better at tying errors back  
to where they happened in the script, right now if one of the MR jobs  
associated with a Pig Latin script fails there is no way to know what  
part of the script it is associated with.

There are probably others, but those are the ones I can think of off  
the top of my head.  The summary from my viewpoint is we still have  
several 0.x releases before we're ready to consider 1.0.  It would be  
nice to be 1.0 not too long after Hadoop is, which still gives us at  
least 6-9 months.

Alan.
On Jun 22, 2009, at 10:58 AM, Dmitriy Ryaboy wrote:

> I know there was some discussion of making the types release (0.2) a  
> "Pig 1"
> release, but that got nixed. There wasn't a similar discussion on 0.3.
> Has the list of want-to-haves for Pig 1.0 been discussed since?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB