Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Improving macros


+
Jonathan Packer 2013-06-19, 17:59
+
Jonathan Coveney 2013-06-19, 20:03
+
Rohini Palaniswamy 2013-06-19, 21:36
Copy link to this message
-
Re: Improving macros
Jonathan Packer 2013-06-20, 21:57
Submitted a patch for this: https://issues.apache.org/jira/browse/PIG-3359
On Wed, Jun 19, 2013 at 5:36 PM, Rohini Palaniswamy <[EMAIL PROTECTED]
> wrote:

> Jon is right. I am trying to ensure that each line is mostly parsed only
> once in https://issues.apache.org/jira/browse/PIG-3204. Have few issues
> with other commands in pig script like fs, shell, cd, illustrate, error
> messages not showing line numbers properly, etc which I have not got to
> solving yet. Register and import command file localization should be done
> only once with my patch as the parsing happens only once (unless you have
> fs or sh commands). Will check it out to be doubly sure.
>
> Regards,
> Rohini
>
>
> On Wed, Jun 19, 2013 at 1:03 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > This sounds excellent! Would love to see this in trunk.
> >
> > As far as #3, this is probably because pig does essentially reparse
> > everything with each new line. I know there is a ticket where Rohini
> dealt
> > with this in some cases where HDFS was being hit multiple times because
> of
> > load statements getting reparsed, but I'm not sure if remote imports are
> > fixed by that patch as well.
> >
> >
> > 2013/6/19 Jonathan Packer <[EMAIL PROTECTED]>
> >
> > > Hi, I'm an engineer at Mortar Data. I was working on some features to
> > > improve macros that I'd like to contribute (we're hoping to build a
> > library
> > > of reusable pig macros implementing common algorithms), but I wanted to
> > > check-in here first to see if anyone has concerns about the changes I'd
> > be
> > > making.
> > >
> > > The changes I've implemented are:
> > >
> > >    1. Macro files can register jars and udfs (avoiding namespace
> > conflicts
> > >    is the user's responsibility)
> > >    2. Macro files can be be redundantly imported (the extra import
> > >    statements will be ignored). The use case is pigscript A imports
> macro
> > >    files A and B, but A also imports B. Pig will emit a warning, but
> not
> > > fail
> > >    as it currently does.
> > >    3. Registers and imports from S3 aren't repeatedly downloaded as a
> > >    pigscript is parsed. I'm not sure why it was doing this in the first
> > > place,
> > >    but it looked like a query was being assembled line-by-line and
> every
> > > time
> > >    it would re-download jars etc.
> > >
> > > I was working on our fork of 0.9.2 with modifications, so please let me
> > > know if any of these have already been fixed in the latest version.
> > >
> > > Thanks,
> > > Jonathan Packer
> > >
> >
>