|
|
-
Magic numbers in my pig scripts
Eric Wadsworth 2010-09-29, 17:00
Hi folks!
I'm brand new to this list, so apologies if this is an inappropriate newbie question, or is otherwise incorrect, but here goes.
I'm working with a bunch of pig scripts, and we're adding new ones almost daily. They are getting more and more complex. The problem is exacerbated by the proliferation of magic numbers throughout them. As a software engineer, these are driving me nuts! The code is quite brittle. There seems to be no way to centralize logic or even values.
For a simple example: filtered_stuff = FILTER stuff by record_type == 23;
I'd prefer: filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA;
Where RECORD_TYPE_ALPHA is defined in some other file that the pig script consumes.
Sounds rather like the old C-style header files would be in order...
Am I missing something obvious here? How do you guys handle this problem? (We're using pig 6 and are just starting to transition to pig 7.)
Thanks! --- Eric Wadsworth
+
Eric Wadsworth 2010-09-29, 17:00
-
Re: Magic numbers in my pig scripts
Saurav Datta 2010-09-29, 17:06
Hi Eric,
As I understand, you would like to define the value of the filter at run time, and this value would be taken from a file. Am I correct ?
Regards, Saurav
On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote:
> Hi folks! > > I'm brand new to this list, so apologies if this is an inappropriate > newbie question, or is otherwise incorrect, but here goes. > > I'm working with a bunch of pig scripts, and we're adding new ones > almost daily. They are getting more and more complex. The problem is > exacerbated by the proliferation of magic numbers throughout them. > As a software engineer, these are driving me nuts! The code is quite > brittle. There seems to be no way to centralize logic or even values. > > For a simple example: > filtered_stuff = FILTER stuff by record_type == 23; > > I'd prefer: > filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; > > Where RECORD_TYPE_ALPHA is defined in some other file that the pig > script consumes. > > Sounds rather like the old C-style header files would be in order... > > Am I missing something obvious here? How do you guys handle this > problem? (We're using pig 6 and are just starting to transition to > pig 7.) > > Thanks! --- Eric Wadsworth
+
Saurav Datta 2010-09-29, 17:06
-
Re: Magic numbers in my pig scripts
Eric Wadsworth 2010-09-29, 17:14
Saurav,
Not that limited, but yes. Another example is in order. Say I have something like this: projected_data = FOREACH data GENERATE com.example.udfs.foo(7, 37, 'https', fields#'bar') as bat;
This sort of thing would be vastly better: projected_data = FOREACH data GENERATE com.example.udfs.foo(FOO_COMMAND_CODE, MAX_FIELD_LENGTH, SCHEME, fields#'bar') as bat;
I know pig isn't a real programming language, maybe I'm asking for too much. But it's so brittle, and as we increase the number of various pig scripts, the odds of a change not breaking a bunch of stuff increases exponentially.
--- Eric Wadsworth
On 09/29/2010 11:06 AM, Saurav Datta wrote: > Hi Eric, > > As I understand, you would like to define the value of the filter at > run time, and this value would be taken from a file. > Am I correct ? > > Regards, > Saurav > > On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > >> Hi folks! >> >> I'm brand new to this list, so apologies if this is an inappropriate >> newbie question, or is otherwise incorrect, but here goes. >> >> I'm working with a bunch of pig scripts, and we're adding new ones >> almost daily. They are getting more and more complex. The problem is >> exacerbated by the proliferation of magic numbers throughout them. As >> a software engineer, these are driving me nuts! The code is quite >> brittle. There seems to be no way to centralize logic or even values. >> >> For a simple example: >> filtered_stuff = FILTER stuff by record_type == 23; >> >> I'd prefer: >> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >> >> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >> script consumes. >> >> Sounds rather like the old C-style header files would be in order... >> >> Am I missing something obvious here? How do you guys handle this >> problem? (We're using pig 6 and are just starting to transition to >> pig 7.) >> >> Thanks! --- Eric Wadsworth >
+
Eric Wadsworth 2010-09-29, 17:14
-
RE: Magic numbers in my pig scripts
Aniket Mokashi 2010-09-29, 17:15
http://wiki.apache.org/pig/ParameterSubstitution http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias like filtered_stuff_threshold). https://issues.apache.org/jira/browse/PIG-1434 Thanks, Aniket -----Original Message----- From: Saurav Datta [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 29, 2010 1:06 PM To: [EMAIL PROTECTED] Subject: Re: Magic numbers in my pig scripts Hi Eric, As I understand, you would like to define the value of the filter at run time, and this value would be taken from a file. Am I correct ? Regards, Saurav On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > Hi folks! > > I'm brand new to this list, so apologies if this is an inappropriate > newbie question, or is otherwise incorrect, but here goes. > > I'm working with a bunch of pig scripts, and we're adding new ones > almost daily. They are getting more and more complex. The problem is > exacerbated by the proliferation of magic numbers throughout them. > As a software engineer, these are driving me nuts! The code is quite > brittle. There seems to be no way to centralize logic or even values. > > For a simple example: > filtered_stuff = FILTER stuff by record_type == 23; > > I'd prefer: > filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; > > Where RECORD_TYPE_ALPHA is defined in some other file that the pig > script consumes. > > Sounds rather like the old C-style header files would be in order... > > Am I missing something obvious here? How do you guys handle this > problem? (We're using pig 6 and are just starting to transition to > pig 7.) > > Thanks! --- Eric Wadsworth
+
Aniket Mokashi 2010-09-29, 17:15
-
Re: Magic numbers in my pig scripts
Saurav Datta 2010-09-29, 17:25
Same here, I was coming to parameter substitution by reading from a parameter file. Here is how you declare the variable year, month and date . A = load '/INPUTDIR/$year/$month/$date/input_test.dat' using PigStorage(' ') as (field1, field2, field3) ; Here is how you invoke the pig script, in local mode though . pig -param_file param_file.cfg -x local testParamFile.pig And below are the contents of the param_file.cfg, in the same directory : year='2010' month='09' date='19' We are using Pig 0.7.0 Let me know if this helps. Regards, Saurav On Sep 29, 2010, at 10:15 AM, Aniket Mokashi wrote: > http://wiki.apache.org/pig/ParameterSubstitution> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html> > Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias > like > filtered_stuff_threshold). > https://issues.apache.org/jira/browse/PIG-1434> > Thanks, > Aniket > > -----Original Message----- > From: Saurav Datta [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 29, 2010 1:06 PM > To: [EMAIL PROTECTED] > Subject: Re: Magic numbers in my pig scripts > > Hi Eric, > > As I understand, you would like to define the value of the filter at > run time, and this value would be taken from a file. > Am I correct ? > > Regards, > Saurav > > On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > >> Hi folks! >> >> I'm brand new to this list, so apologies if this is an inappropriate >> newbie question, or is otherwise incorrect, but here goes. >> >> I'm working with a bunch of pig scripts, and we're adding new ones >> almost daily. They are getting more and more complex. The problem is >> exacerbated by the proliferation of magic numbers throughout them. >> As a software engineer, these are driving me nuts! The code is quite >> brittle. There seems to be no way to centralize logic or even values. >> >> For a simple example: >> filtered_stuff = FILTER stuff by record_type == 23; >> >> I'd prefer: >> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >> >> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >> script consumes. >> >> Sounds rather like the old C-style header files would be in order... >> >> Am I missing something obvious here? How do you guys handle this >> problem? (We're using pig 6 and are just starting to transition to >> pig 7.) >> >> Thanks! --- Eric Wadsworth > > >
+
Saurav Datta 2010-09-29, 17:25
-
RE: Magic numbers in my pig scripts
Matthew Smith 2010-09-29, 18:11
Maybe this is off topic, but I used it in Java code with a parameter array. In MAIN (or UI, Input, etc.): String[] params = new String[]; params[0]= "date'; params[1]="filter_regex"; runScript(params); in runScript(String[] params, pigServer server, String inputPath, String outputPath) PigServer.registerQuery("data = Load "'+inputPath+'" USING PigStorage('|') AS (date:chararray,comment:chararray);"); PigServer.registerQuery("filtered= FILTER data BY date=='"+params[0]+"' AND comment=='"+params[1]+"';); .... Just a thought... Matt -----Original Message----- From: Saurav Datta [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 29, 2010 1:25 PM To: [EMAIL PROTECTED] Subject: Re: Magic numbers in my pig scripts Same here, I was coming to parameter substitution by reading from a parameter file. Here is how you declare the variable year, month and date . A = load '/INPUTDIR/$year/$month/$date/input_test.dat' using PigStorage(' ') as (field1, field2, field3) ; Here is how you invoke the pig script, in local mode though . pig -param_file param_file.cfg -x local testParamFile.pig And below are the contents of the param_file.cfg, in the same directory : year='2010' month='09' date='19' We are using Pig 0.7.0 Let me know if this helps. Regards, Saurav On Sep 29, 2010, at 10:15 AM, Aniket Mokashi wrote: > http://wiki.apache.org/pig/ParameterSubstitution> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html> > Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias > like > filtered_stuff_threshold). > https://issues.apache.org/jira/browse/PIG-1434> > Thanks, > Aniket > > -----Original Message----- > From: Saurav Datta [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 29, 2010 1:06 PM > To: [EMAIL PROTECTED] > Subject: Re: Magic numbers in my pig scripts > > Hi Eric, > > As I understand, you would like to define the value of the filter at > run time, and this value would be taken from a file. > Am I correct ? > > Regards, > Saurav > > On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > >> Hi folks! >> >> I'm brand new to this list, so apologies if this is an inappropriate >> newbie question, or is otherwise incorrect, but here goes. >> >> I'm working with a bunch of pig scripts, and we're adding new ones >> almost daily. They are getting more and more complex. The problem is >> exacerbated by the proliferation of magic numbers throughout them. >> As a software engineer, these are driving me nuts! The code is quite >> brittle. There seems to be no way to centralize logic or even values. >> >> For a simple example: >> filtered_stuff = FILTER stuff by record_type == 23; >> >> I'd prefer: >> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >> >> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >> script consumes. >> >> Sounds rather like the old C-style header files would be in order... >> >> Am I missing something obvious here? How do you guys handle this >> problem? (We're using pig 6 and are just starting to transition to >> pig 7.) >> >> Thanks! --- Eric Wadsworth > > >
+
Matthew Smith 2010-09-29, 18:11
-
Re: Magic numbers in my pig scripts
Eric Wadsworth 2010-09-29, 22:32
Piggers, Parameter substitution isn't really what I'm needing. After some discussion with my co-workers, it looks like the best feature would really be sort of a pre-processor. Basically, insert a line in your pig script that would "include" another pig script, right there. Then that other pig script could contain defines, code, whatever. This would allow us to build a hierarchy of scripts, where we could tweak some defines at the top level, and the results would be consumed by the lower levels. --- Eric Wadsworth On 09/29/2010 11:15 AM, Aniket Mokashi wrote: > http://wiki.apache.org/pig/ParameterSubstitution> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html> > Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias like > filtered_stuff_threshold). > https://issues.apache.org/jira/browse/PIG-1434> > Thanks, > Aniket > > -----Original Message----- > From: Saurav Datta [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 29, 2010 1:06 PM > To: [EMAIL PROTECTED] > Subject: Re: Magic numbers in my pig scripts > > Hi Eric, > > As I understand, you would like to define the value of the filter at > run time, and this value would be taken from a file. > Am I correct ? > > Regards, > Saurav > > On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > > >> Hi folks! >> >> I'm brand new to this list, so apologies if this is an inappropriate >> newbie question, or is otherwise incorrect, but here goes. >> >> I'm working with a bunch of pig scripts, and we're adding new ones >> almost daily. They are getting more and more complex. The problem is >> exacerbated by the proliferation of magic numbers throughout them. >> As a software engineer, these are driving me nuts! The code is quite >> brittle. There seems to be no way to centralize logic or even values. >> >> For a simple example: >> filtered_stuff = FILTER stuff by record_type == 23; >> >> I'd prefer: >> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >> >> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >> script consumes. >> >> Sounds rather like the old C-style header files would be in order... >> >> Am I missing something obvious here? How do you guys handle this >> problem? (We're using pig 6 and are just starting to transition to >> pig 7.) >> >> Thanks! --- Eric Wadsworth >> > > >
+
Eric Wadsworth 2010-09-29, 22:32
-
Re: Magic numbers in my pig scripts
Thejas M Nair 2010-09-29, 22:59
Support for functions as part of the turing complete pig effort should help (it is in early design stages)- http://wiki.apache.org/pig/TuringCompletePig-Thejas On 9/29/10 3:32 PM, "Eric Wadsworth" <[EMAIL PROTECTED]> wrote: Piggers, Parameter substitution isn't really what I'm needing. After some discussion with my co-workers, it looks like the best feature would really be sort of a pre-processor. Basically, insert a line in your pig script that would "include" another pig script, right there. Then that other pig script could contain defines, code, whatever. This would allow us to build a hierarchy of scripts, where we could tweak some defines at the top level, and the results would be consumed by the lower levels. --- Eric Wadsworth On 09/29/2010 11:15 AM, Aniket Mokashi wrote: > http://wiki.apache.org/pig/ParameterSubstitution> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html> > Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias like > filtered_stuff_threshold). > https://issues.apache.org/jira/browse/PIG-1434> > Thanks, > Aniket > > -----Original Message----- > From: Saurav Datta [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 29, 2010 1:06 PM > To: [EMAIL PROTECTED] > Subject: Re: Magic numbers in my pig scripts > > Hi Eric, > > As I understand, you would like to define the value of the filter at > run time, and this value would be taken from a file. > Am I correct ? > > Regards, > Saurav > > On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: > > >> Hi folks! >> >> I'm brand new to this list, so apologies if this is an inappropriate >> newbie question, or is otherwise incorrect, but here goes. >> >> I'm working with a bunch of pig scripts, and we're adding new ones >> almost daily. They are getting more and more complex. The problem is >> exacerbated by the proliferation of magic numbers throughout them. >> As a software engineer, these are driving me nuts! The code is quite >> brittle. There seems to be no way to centralize logic or even values. >> >> For a simple example: >> filtered_stuff = FILTER stuff by record_type == 23; >> >> I'd prefer: >> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >> >> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >> script consumes. >> >> Sounds rather like the old C-style header files would be in order... >> >> Am I missing something obvious here? How do you guys handle this >> problem? (We're using pig 6 and are just starting to transition to >> pig 7.) >> >> Thanks! --- Eric Wadsworth >> > > >
+
Thejas M Nair 2010-09-29, 22:59
-
Re: Magic numbers in my pig scripts
Dmitriy Ryaboy 2010-09-30, 20:30
Eric, check out piglet: http://github.com/iconara/pigletOn Wed, Sep 29, 2010 at 3:32 PM, Eric Wadsworth <[EMAIL PROTECTED]> wrote: > Piggers, > > Parameter substitution isn't really what I'm needing. After some discussion > with my co-workers, it looks like the best feature would really be sort of a > pre-processor. Basically, insert a line in your pig script that would > "include" another pig script, right there. Then that other pig script could > contain defines, code, whatever. This would allow us to build a hierarchy of > scripts, where we could tweak some defines at the top level, and the results > would be consumed by the lower levels. > > --- Eric Wadsworth > > > On 09/29/2010 11:15 AM, Aniket Mokashi wrote: > >> http://wiki.apache.org/pig/ParameterSubstitution>> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html>> >> Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias like >> filtered_stuff_threshold). >> https://issues.apache.org/jira/browse/PIG-1434>> >> Thanks, >> Aniket >> >> -----Original Message----- >> From: Saurav Datta [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, September 29, 2010 1:06 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Magic numbers in my pig scripts >> >> Hi Eric, >> >> As I understand, you would like to define the value of the filter at >> run time, and this value would be taken from a file. >> Am I correct ? >> >> Regards, >> Saurav >> >> On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote: >> >> >> >>> Hi folks! >>> >>> I'm brand new to this list, so apologies if this is an inappropriate >>> newbie question, or is otherwise incorrect, but here goes. >>> >>> I'm working with a bunch of pig scripts, and we're adding new ones >>> almost daily. They are getting more and more complex. The problem is >>> exacerbated by the proliferation of magic numbers throughout them. >>> As a software engineer, these are driving me nuts! The code is quite >>> brittle. There seems to be no way to centralize logic or even values. >>> >>> For a simple example: >>> filtered_stuff = FILTER stuff by record_type == 23; >>> >>> I'd prefer: >>> filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA; >>> >>> Where RECORD_TYPE_ALPHA is defined in some other file that the pig >>> script consumes. >>> >>> Sounds rather like the old C-style header files would be in order... >>> >>> Am I missing something obvious here? How do you guys handle this >>> problem? (We're using pig 6 and are just starting to transition to >>> pig 7.) >>> >>> Thanks! --- Eric Wadsworth >>> >>> >> >> >> >> > >
+
Dmitriy Ryaboy 2010-09-30, 20:30
|
|