Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Magic numbers in my pig scripts

Copy link to this message
Re: Magic numbers in my pig scripts
Support for functions as part of the turing complete pig effort should help (it is in early design stages)-

On 9/29/10 3:32 PM, "Eric Wadsworth" <[EMAIL PROTECTED]> wrote:


Parameter substitution isn't really what I'm needing. After some
discussion with my co-workers, it looks like the best feature would
really be sort of a pre-processor. Basically, insert a line in your pig
script that would "include" another pig script, right there. Then that
other pig script could contain defines, code, whatever. This would allow
us to build a hierarchy of scripts, where we could tweak some defines at
the top level, and the results would be consumed by the lower levels.

--- Eric Wadsworth

On 09/29/2010 11:15 AM, Aniket Mokashi wrote:
> http://wiki.apache.org/pig/ParameterSubstitution
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html
> Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias like
> filtered_stuff_threshold).
> https://issues.apache.org/jira/browse/PIG-1434
> Thanks,
> Aniket
> -----Original Message-----
> From: Saurav Datta [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 29, 2010 1:06 PM
> Subject: Re: Magic numbers in my pig scripts
> Hi Eric,
> As I understand, you would like to define the value of the filter at
> run time, and this value would be taken from a file.
> Am I correct ?
> Regards,
> Saurav
> On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote:
>> Hi folks!
>> I'm brand new to this list, so apologies if this is an inappropriate
>> newbie question, or is otherwise incorrect, but here goes.
>> I'm working with a bunch of pig scripts, and we're adding new ones
>> almost daily. They are getting more and more complex. The problem is
>> exacerbated by the proliferation of magic numbers throughout them.
>> As a software engineer, these are driving me nuts! The code is quite
>> brittle. There seems to be no way to centralize logic or even values.
>> For a simple example:
>> filtered_stuff = FILTER stuff by record_type == 23;
>> I'd prefer:
>> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA;
>> Where RECORD_TYPE_ALPHA is defined in some other file that the pig
>> script consumes.
>> Sounds rather like the old C-style header files would be in order...
>> Am I missing something obvious here? How do you guys handle this
>> problem? (We're using pig 6 and are just starting to transition to
>> pig 7.)
>> Thanks! --- Eric Wadsworth