Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Improving macros


Copy link to this message
-
Improving macros
Hi, I'm an engineer at Mortar Data. I was working on some features to
improve macros that I'd like to contribute (we're hoping to build a library
of reusable pig macros implementing common algorithms), but I wanted to
check-in here first to see if anyone has concerns about the changes I'd be
making.

The changes I've implemented are:

   1. Macro files can register jars and udfs (avoiding namespace conflicts
   is the user's responsibility)
   2. Macro files can be be redundantly imported (the extra import
   statements will be ignored). The use case is pigscript A imports macro
   files A and B, but A also imports B. Pig will emit a warning, but not fail
   as it currently does.
   3. Registers and imports from S3 aren't repeatedly downloaded as a
   pigscript is parsed. I'm not sure why it was doing this in the first place,
   but it looked like a query was being assembled line-by-line and every time
   it would re-download jars etc.

I was working on our fork of 0.9.2 with modifications, so please let me
know if any of these have already been fixed in the latest version.

Thanks,
Jonathan Packer
+
Jonathan Coveney 2013-06-19, 20:03
+
Rohini Palaniswamy 2013-06-19, 21:36
+
Jonathan Packer 2013-06-20, 21:57