Jonathan Packer 2013-06-19, 17:59
Jonathan Coveney 2013-06-19, 20:03
Rohini Palaniswamy 2013-06-19, 21:36
Submitted a patch for this: https://issues.apache.org/jira/browse/PIG-3359
On Wed, Jun 19, 2013 at 5:36 PM, Rohini Palaniswamy <[EMAIL PROTECTED]
> Jon is right. I am trying to ensure that each line is mostly parsed only
> once in https://issues.apache.org/jira/browse/PIG-3204. Have few issues
> with other commands in pig script like fs, shell, cd, illustrate, error
> messages not showing line numbers properly, etc which I have not got to
> solving yet. Register and import command file localization should be done
> only once with my patch as the parsing happens only once (unless you have
> fs or sh commands). Will check it out to be doubly sure.
> On Wed, Jun 19, 2013 at 1:03 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > This sounds excellent! Would love to see this in trunk.
> > As far as #3, this is probably because pig does essentially reparse
> > everything with each new line. I know there is a ticket where Rohini
> > with this in some cases where HDFS was being hit multiple times because
> > load statements getting reparsed, but I'm not sure if remote imports are
> > fixed by that patch as well.
> > 2013/6/19 Jonathan Packer <[EMAIL PROTECTED]>
> > > Hi, I'm an engineer at Mortar Data. I was working on some features to
> > > improve macros that I'd like to contribute (we're hoping to build a
> > library
> > > of reusable pig macros implementing common algorithms), but I wanted to
> > > check-in here first to see if anyone has concerns about the changes I'd
> > be
> > > making.
> > >
> > > The changes I've implemented are:
> > >
> > > 1. Macro files can register jars and udfs (avoiding namespace
> > conflicts
> > > is the user's responsibility)
> > > 2. Macro files can be be redundantly imported (the extra import
> > > statements will be ignored). The use case is pigscript A imports
> > > files A and B, but A also imports B. Pig will emit a warning, but
> > > fail
> > > as it currently does.
> > > 3. Registers and imports from S3 aren't repeatedly downloaded as a
> > > pigscript is parsed. I'm not sure why it was doing this in the first
> > > place,
> > > but it looked like a query was being assembled line-by-line and
> > > time
> > > it would re-download jars etc.
> > >
> > > I was working on our fork of 0.9.2 with modifications, so please let me
> > > know if any of these have already been fixed in the latest version.
> > >
> > > Thanks,
> > > Jonathan Packer
> > >