Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Improving macros

Jonathan Packer 2013-06-19, 17:59
Copy link to this message
Re: Improving macros
This sounds excellent! Would love to see this in trunk.

As far as #3, this is probably because pig does essentially reparse
everything with each new line. I know there is a ticket where Rohini dealt
with this in some cases where HDFS was being hit multiple times because of
load statements getting reparsed, but I'm not sure if remote imports are
fixed by that patch as well.
2013/6/19 Jonathan Packer <[EMAIL PROTECTED]>

> Hi, I'm an engineer at Mortar Data. I was working on some features to
> improve macros that I'd like to contribute (we're hoping to build a library
> of reusable pig macros implementing common algorithms), but I wanted to
> check-in here first to see if anyone has concerns about the changes I'd be
> making.
> The changes I've implemented are:
>    1. Macro files can register jars and udfs (avoiding namespace conflicts
>    is the user's responsibility)
>    2. Macro files can be be redundantly imported (the extra import
>    statements will be ignored). The use case is pigscript A imports macro
>    files A and B, but A also imports B. Pig will emit a warning, but not
> fail
>    as it currently does.
>    3. Registers and imports from S3 aren't repeatedly downloaded as a
>    pigscript is parsed. I'm not sure why it was doing this in the first
> place,
>    but it looked like a query was being assembled line-by-line and every
> time
>    it would re-download jars etc.
> I was working on our fork of 0.9.2 with modifications, so please let me
> know if any of these have already been fixed in the latest version.
> Thanks,
> Jonathan Packer
Rohini Palaniswamy 2013-06-19, 21:36
Jonathan Packer 2013-06-20, 21:57