Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Improving macros

Jonathan Packer 2013-06-19, 17:59
Jonathan Coveney 2013-06-19, 20:03
Copy link to this message
Re: Improving macros
Jon is right. I am trying to ensure that each line is mostly parsed only
once in https://issues.apache.org/jira/browse/PIG-3204. Have few issues
with other commands in pig script like fs, shell, cd, illustrate, error
messages not showing line numbers properly, etc which I have not got to
solving yet. Register and import command file localization should be done
only once with my patch as the parsing happens only once (unless you have
fs or sh commands). Will check it out to be doubly sure.

On Wed, Jun 19, 2013 at 1:03 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> This sounds excellent! Would love to see this in trunk.
> As far as #3, this is probably because pig does essentially reparse
> everything with each new line. I know there is a ticket where Rohini dealt
> with this in some cases where HDFS was being hit multiple times because of
> load statements getting reparsed, but I'm not sure if remote imports are
> fixed by that patch as well.
> 2013/6/19 Jonathan Packer <[EMAIL PROTECTED]>
> > Hi, I'm an engineer at Mortar Data. I was working on some features to
> > improve macros that I'd like to contribute (we're hoping to build a
> library
> > of reusable pig macros implementing common algorithms), but I wanted to
> > check-in here first to see if anyone has concerns about the changes I'd
> be
> > making.
> >
> > The changes I've implemented are:
> >
> >    1. Macro files can register jars and udfs (avoiding namespace
> conflicts
> >    is the user's responsibility)
> >    2. Macro files can be be redundantly imported (the extra import
> >    statements will be ignored). The use case is pigscript A imports macro
> >    files A and B, but A also imports B. Pig will emit a warning, but not
> > fail
> >    as it currently does.
> >    3. Registers and imports from S3 aren't repeatedly downloaded as a
> >    pigscript is parsed. I'm not sure why it was doing this in the first
> > place,
> >    but it looked like a query was being assembled line-by-line and every
> > time
> >    it would re-download jars etc.
> >
> > I was working on our fork of 0.9.2 with modifications, so please let me
> > know if any of these have already been fixed in the latest version.
> >
> > Thanks,
> > Jonathan Packer
> >
Jonathan Packer 2013-06-20, 21:57