Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> RE: reading input parameters in a pig script


+
Siddhi Borkar 2013-02-19, 11:58
+
Jonathan Coveney 2013-02-19, 13:12
+
Siddhi Borkar 2013-02-19, 23:44
+
Prashant Kommireddi 2013-02-19, 23:54
+
Jonathan Coveney 2013-02-20, 09:08
+
Siddhi Borkar 2013-02-20, 12:13
Copy link to this message
-
Re: reading input parameters in a pig script
what is going to be generating the "pig -param param1=..." and so on?
Couldn't these be made into arguments? ie

REGISTER /opt/apache_pig/pig-0.10.1/
contrib/piggybank/java/piggybank.jar;
REGISTER /tmp/custudf.jar;

DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor('$fields');
PRODUCTS = load 'product.xml' using
org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
(id:chararray, name:chararray, description:chararray);
and you callit with pig -param fields=name,description

and there has to be an output format, so in that case a %default would work?
2013/2/20 Siddhi Borkar <[EMAIL PROTECTED]>

> I will not be able to use %default statement in my pig script, as the
> parameters being passed to my pig script are not fixed. I would need a
> conditional check to be done in my pig script to check for each and every
> input parameter if it is passed or not.
> Also, there are no conditional operators (if/else) available in pig .
>
> Following is the psuedocode of the functionality I want to achieve
>
> Consider  pig files:
> 1) xmlparser.pig
> 2) excelexporter.pig
> 3) htmlexporter.pig
>
> 1) xmlparser.pig
> REGISTER /opt/apache_pig/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
> REGISTER /tmp/custudf.jar;
>
> DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor();
> PRODUCTS = load 'product.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
> PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
> (id:chararray, name:chararray, description:chararray);
>
> Please note, XMLProcessor is a custom java based udf which parses the xml.
>
> 2) excelexporter.pig
> STORE PRODUCT INTO '/tmp/prod.csv' USING
> CSVExcelStorage(',','NO_MULTILINE','UNIX');
>
> 3) htmlexporter.pig
> //logic for this is not yet implemented
>
> Now the requirement is that I need to write a wrapper pig script which
> invokes the following script and generates an output. The parameters that
> will be passed are the input params and the out file format
>
> For ex pig -param param1=name param2=description outfileformat=csv
> wrapper.pig
>
> Now what I need to do is based on the params passed to the wrapper pig
> script, I need to send inputs to the xml parser and parse the input params.
> In the above case since name and description are passed as params the xml
> should be parsed only for these 2 fields.
> Any idea how this can be achieved in a pig script?
>
> Also depending on the output file format, I need to invoke the
> corresponding exporter script (html or csv) from my wrapper script. I don’t
> see any conditional operators available (if/else) in pig. Any idea how this
> can be achieved?
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2013 2:38 PM
> To: [EMAIL PROTECTED]
> Subject: Re: reading input parameters in a pig script
>
> Reiterating Prashant's comments.
>
> In the script though you can have a %default statement which will define
> the default value for a parameter, which can also be overriden. My guess is
> this might let you do what you want?
>
>
> 2013/2/20 Prashant Kommireddi <[EMAIL PROTECTED]>
>
> > Hi Siddhi,
> >
> > "Is there any way to access these params in the script without
> > referring to the param name?" -- how would you associate a param value
> to pig statement?
> >
> > I am guessing in this case your pig script is also dynamically generated?
> > You could use PigServer API
> > http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/PigServer.html
> > to generate params in a Java program and embed them into a script.
> >
> > -Prashant
> >
> >
> > On Tue, Feb 19, 2013 at 3:44 PM, Siddhi Borkar <
> > [EMAIL PROTECTED]> wrote:
> >
> > >
> > > Consider the following command
> > > pig -param param1=test param2=test1 param3=test2 myscript.pig
> > >
> > > In my case the parameters are dynamic, as in I could either pass
+
Siddhi Borkar 2013-02-21, 03:49
+
Jonathan Coveney 2013-02-21, 08:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB