Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> RE: reading input parameters in a pig script


Copy link to this message
-
Re: reading input parameters in a pig script
ah, ok, I see what you are doing. You could do it with some very hacky
native Pig, but what I would say it to do what Prashant mentioned and use
either Java or Python for control flow (we support both). So all of the
logic about what scripts to run etc would be done outside of Pig, and then
you build and submit the Pig job via that api.
2013/2/21 Siddhi Borkar <[EMAIL PROTECTED]>

> Thanks Jonathan for your prompt reply,
> The parameters that will be supplied as the input to the pig script may or
> may not be present in the xml. For eg, say I pass fields as
> 'name,description,countDistributers' , the name and description can be
> obtained easily by parsing the xml , however for countDistributers' I have
> a separate pig script which needs to be invoked.
>
> I have read that it is not possible to have control statements (if else)
> in pig. Any idea how control flow can be defined?
> For eg:
> If fields contain countDistributers'
>         Invoke countDistributers'.pig
>
> Also  PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
> (id:chararray, name:chararray, description:chararray)
>
> id:chararray, name:chararray, description:chararray has to be dynamically
> created based on the parameters passed.
> Is there any way of getting this done?
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2013 10:48 PM
> To: [EMAIL PROTECTED]
> Subject: Re: reading input parameters in a pig script
>
> what is going to be generating the "pig -param param1=..." and so on?
> Couldn't these be made into arguments? ie
>
> REGISTER /opt/apache_pig/pig-0.10.1/
> contrib/piggybank/java/piggybank.jar;
> REGISTER /tmp/custudf.jar;
>
> DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor('$fields');
> PRODUCTS = load 'product.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray);
> PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as
> (id:chararray, name:chararray, description:chararray);
>
>
> and you callit with pig -param fields=name,description
>
> and there has to be an output format, so in that case a %default would
> work?
>
>
> 2013/2/20 Siddhi Borkar <[EMAIL PROTECTED]>
>
> > I will not be able to use %default statement in my pig script, as the
> > parameters being passed to my pig script are not fixed. I would need a
> > conditional check to be done in my pig script to check for each and
> > every input parameter if it is passed or not.
> > Also, there are no conditional operators (if/else) available in pig .
> >
> > Following is the psuedocode of the functionality I want to achieve
> >
> > Consider  pig files:
> > 1) xmlparser.pig
> > 2) excelexporter.pig
> > 3) htmlexporter.pig
> >
> > 1) xmlparser.pig
> > REGISTER
> > /opt/apache_pig/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
> > REGISTER /tmp/custudf.jar;
> >
> > DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor();
> > PRODUCTS = load 'product.xml' using
> > org.apache.pig.piggybank.storage.XMLLoader('product') as
> > (line:chararray); PRODUCT = FOREACH PRODUCTS GENERATE
> > FLATTEN(XMLProcessor(line)) as (id:chararray, name:chararray,
> > description:chararray);
> >
> > Please note, XMLProcessor is a custom java based udf which parses the
> xml.
> >
> > 2) excelexporter.pig
> > STORE PRODUCT INTO '/tmp/prod.csv' USING
> > CSVExcelStorage(',','NO_MULTILINE','UNIX');
> >
> > 3) htmlexporter.pig
> > //logic for this is not yet implemented
> >
> > Now the requirement is that I need to write a wrapper pig script which
> > invokes the following script and generates an output. The parameters
> > that will be passed are the input params and the out file format
> >
> > For ex pig -param param1=name param2=description outfileformat=csv
> > wrapper.pig
> >
> > Now what I need to do is based on the params passed to the wrapper pig
> > script, I need to send inputs to the xml parser and parse the input
> params.
> > In the above case since name and description are passed as params the