Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - switching to different parser in Pig


Copy link to this message
-
RE: switching to different parser in Pig
Santhosh Srinivasan 2009-08-25, 17:58
Its been 6 months since this topic was discussed but we don't have
closure on it.

For SQL on top of Pig, we are using Jflex and CUP
(https://issues.apache.org/jira/browse/PIG-824). If we have decided on
the right parser, can we have a plan to move the other parsers in Pig to
the same technology?

Thanks,
Santhosh

PS: I am assuming we are not moving to Antlr.
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 24, 2009 10:17 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: switching to different parser in Pig

Sorry, after I sent that email yesterday I realized I was not very  
clear.  I did not mean to imply that antlr didn't have good  
documentation or good error handling.  What I wanted to say was we  
want all three of those things, and it didn't appear that antlr  
provided all three, since it doesn't separate out scanner and parser.  
Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc  
to top down parsers like javacc.  My understanding is that antlr is  
top down like javacc.  My reasoning for this preference is that parser  
books and classes have used those for decades, so there are a large  
number of engineers out there (including me :) ) who know how to work  
with them.  But maybe antlr is close enough to what we need.  I'll  
take a deeper look at it before I vote officially on which way we  
should go.

As for loops and branches, I'm not saying we need those in Pig Latin.  
We need them somehow.  Whether it's better to put them in Pig Latin or  
imbed pig in a existing script language is an ongoing debate.  I don't  
want to make a decision now that effectively ends that debate without  
buy in from those who feel strongly that Pig Latin should include  
those constructs.

I agree with you that we should modify the logical plan to support  
this rather than add another layer.  As for active development, the  
only thing I'm aware of is we hope to start working on a more robust  
optimizer for pig soon, and that will require some additional  
functionality out of the logical operators, but it shouldn't cause any  
fundamental architectural changes.

Alan.
On Feb 24, 2009, at 1:27 AM, pi song wrote:

> (1) Lack of good documentation which makes it hard to and time  
> consuming
> to learn javacc and make changes to Pig grammar
> <== ANTLR is very very well documented.
> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
> http://media.pragprog.com/titles/tpantlr/toc.pdf
> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>
> (2) No easy way to customize error handling and error messages
> <== ANTLR has very extensive error handling support
> http://media.pragprog.com/titles/tpantlr/errors.pdf
>
> (3) Single path that performs both tokenizing and parsing
> <== What is the advantage of decoupling tokenizer and parsing ?
>
> In addition, "Composite Grammar" is very useful for keeping the parser
> modular. Things that can be treated as sub-languages such as bag  
> schema
> definition can be done and unit tested separately.
>
> ANTLRWorks http://www.antlr.org/works/index.html
> <http://www.antlr.org/works/index.html>also
> makes grammar development very efficient. Think about IDE that helps  
> you
> debug your code (which is grammar).
>
> One question, is there any use case for branching and loops? The  
> current Pig
> is more like a query (declarative) language. I don't really see how  
> loop
> constructs would fit. I think what Ted mentioned is more embedding  
> Pig in
> other languages and use those languages to do loops.
>
> We should think about how the logical plan layer can be made simpler  
> for
> external use so don't have to introduce a new layer. Is there any  
> major
> active development on it? Currently I have more spare time and  
> should be
> able to help out. (BTW, I'm slow because this is just my hobby. I  
> don't want
> to drag you guys)
>
> Pi Song
>
> On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia
<[EMAIL PROTECTED]