Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - RE: reading input parameters in a pig script


+
Siddhi Borkar 2013-02-19, 11:58
+
Jonathan Coveney 2013-02-19, 13:12
+
Siddhi Borkar 2013-02-19, 23:44
+
Prashant Kommireddi 2013-02-19, 23:54
+
Jonathan Coveney 2013-02-20, 09:08
+
Siddhi Borkar 2013-02-20, 12:13
Copy link to this message
-
Re: reading input parameters in a pig script
Jonathan Coveney 2013-02-20, 17:17
what is going to be generating the "pig -param param1=..." and so on?
Couldn't these be made into arguments? ie

REGISTER /opt/apache_pig/pig-0.10.1/
contrib/piggybank/java/piggybank.jar;
REGISTER /tmp/custudf.jar;

DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor('$fields');
PRODUCTS = load 'product.xml' using
org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
(id:chararray, name:chararray, description:chararray);
and you callit with pig -param fields=name,description

and there has to be an output format, so in that case a %default would work?
2013/2/20 Siddhi Borkar <[EMAIL PROTECTED]>

> I will not be able to use %default statement in my pig script, as the
> parameters being passed to my pig script are not fixed. I would need a
> conditional check to be done in my pig script to check for each and every
> input parameter if it is passed or not.
> Also, there are no conditional operators (if/else) available in pig .
>
> Following is the psuedocode of the functionality I want to achieve
>
> Consider  pig files:
> 1) xmlparser.pig
> 2) excelexporter.pig
> 3) htmlexporter.pig
>
> 1) xmlparser.pig
> REGISTER /opt/apache_pig/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
> REGISTER /tmp/custudf.jar;
>
> DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor();
> PRODUCTS = load 'product.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
> PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
> (id:chararray, name:chararray, description:chararray);
>
> Please note, XMLProcessor is a custom java based udf which parses the xml.
>
> 2) excelexporter.pig
> STORE PRODUCT INTO '/tmp/prod.csv' USING
> CSVExcelStorage(',','NO_MULTILINE','UNIX');
>
> 3) htmlexporter.pig
> //logic for this is not yet implemented
>
> Now the requirement is that I need to write a wrapper pig script which
> invokes the following script and generates an output. The parameters that
> will be passed are the input params and the out file format
>
> For ex pig -param param1=name param2=description outfileformat=csv
> wrapper.pig
>
> Now what I need to do is based on the params passed to the wrapper pig
> script, I need to send inputs to the xml parser and parse the input params.
> In the above case since name and description are passed as params the xml
> should be parsed only for these 2 fields.
> Any idea how this can be achieved in a pig script?
>
> Also depending on the output file format, I need to invoke the
> corresponding exporter script (html or csv) from my wrapper script. I don’t
> see any conditional operators available (if/else) in pig. Any idea how this
> can be achieved?
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2013 2:38 PM
> To: [EMAIL PROTECTED]
> Subject: Re: reading input parameters in a pig script
>
> Reiterating Prashant's comments.
>
> In the script though you can have a %default statement which will define
> the default value for a parameter, which can also be overriden. My guess is
> this might let you do what you want?
>
>
> 2013/2/20 Prashant Kommireddi <[EMAIL PROTECTED]>
>
> > Hi Siddhi,
> >
> > "Is there any way to access these params in the script without
> > referring to the param name?" -- how would you associate a param value
> to pig statement?
> >
> > I am guessing in this case your pig script is also dynamically generated?
> > You could use PigServer API
> > http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html
> > to generate params in a Java program and embed them into a script.
> >
> > -Prashant
> >
> >
> > On Tue, Feb 19, 2013 at 3:44 PM, Siddhi Borkar <
> > [EMAIL PROTECTED]> wrote:
> >
> > >
> > > Consider the following command
> > > pig -param param1=test param2=test1 param3=test2 myscript.pig
> > >
> > > In my case the parameters are dynamic, as in I could either pass
+
Siddhi Borkar 2013-02-21, 03:49
+
Jonathan Coveney 2013-02-21, 08:29