Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - RE: reading input parameters in a pig script

Copy link to this message
RE: reading input parameters in a pig script
Siddhi Borkar 2013-02-20, 12:13
I will not be able to use %default statement in my pig script, as the parameters being passed to my pig script are not fixed. I would need a conditional check to be done in my pig script to check for each and every input parameter if it is passed or not.
Also, there are no conditional operators (if/else) available in pig .

Following is the psuedocode of the functionality I want to achieve

Consider  pig files:
1) xmlparser.pig
2) excelexporter.pig
3) htmlexporter.pig

1) xmlparser.pig
REGISTER /opt/apache_pig/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
REGISTER /tmp/custudf.jar;

DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor();
PRODUCTS = load 'product.xml' using org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as (id:chararray, name:chararray, description:chararray);

Please note, XMLProcessor is a custom java based udf which parses the xml.

2) excelexporter.pig
STORE PRODUCT INTO '/tmp/prod.csv' USING CSVExcelStorage(',','NO_MULTILINE','UNIX');

3) htmlexporter.pig
//logic for this is not yet implemented

Now the requirement is that I need to write a wrapper pig script which invokes the following script and generates an output. The parameters that will be passed are the input params and the out file format

For ex pig -param param1=name param2=description outfileformat=csv wrapper.pig

Now what I need to do is based on the params passed to the wrapper pig script, I need to send inputs to the xml parser and parse the input params. In the above case since name and description are passed as params the xml should be parsed only for these 2 fields.
Any idea how this can be achieved in a pig script?

Also depending on the output file format, I need to invoke the corresponding exporter script (html or csv) from my wrapper script. I don’t see any conditional operators available (if/else) in pig. Any idea how this can be achieved?

-----Original Message-----
From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2013 2:38 PM
Subject: Re: reading input parameters in a pig script

Reiterating Prashant's comments.

In the script though you can have a %default statement which will define the default value for a parameter, which can also be overriden. My guess is this might let you do what you want?
2013/2/20 Prashant Kommireddi <[EMAIL PROTECTED]>

> Hi Siddhi,
> "Is there any way to access these params in the script without
> referring to the param name?" -- how would you associate a param value to pig statement?
> I am guessing in this case your pig script is also dynamically generated?
> You could use PigServer API
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html
> to generate params in a Java program and embed them into a script.
> -Prashant
> On Tue, Feb 19, 2013 at 3:44 PM, Siddhi Borkar <
> >
> > Consider the following command
> > pig -param param1=test param2=test1 param3=test2 myscript.pig
> >
> > In my case the parameters are dynamic, as in I could either pass
> > param1 only or I could pass all three params or some extra params.
> >
> > Since the parameters are dynamic, in my pig script I will not be
> > able to refrence the parameters as '$param1' . Is there any way to
> > access these params in the script without referring to the param name?
> >
> > ________________________________________
> > From: Jonathan Coveney [[EMAIL PROTECTED]]
> > Sent: Tuesday, February 19, 2013 6:42 PM
> > Subject: Re: reading input parameters in a pig script
> >
> > Can you give an example of what you'd like this to look like?
> >
> >
> > 2013/2/19 Siddhi Borkar <[EMAIL PROTECTED]>
> >
> > > Hi ,
> > >
> > > I need to pass parameters dynamically to a pig script. Is there
> > > any way
> > to
> > > read the parameters passed and their corresponding values without
> giving

=========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.