Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - RE: reading input parameters in a pig script


Copy link to this message
-
RE: reading input parameters in a pig script
Siddhi Borkar 2013-02-21, 03:49
Thanks Jonathan for your prompt reply,
The parameters that will be supplied as the input to the pig script may or may not be present in the xml. For eg, say I pass fields as 'name,description,countDistributers' , the name and description can be obtained easily by parsing the xml , however for countDistributers' I have a separate pig script which needs to be invoked.

I have read that it is not possible to have control statements (if else) in pig. Any idea how control flow can be defined?
For eg:
If fields contain countDistributers'
Invoke countDistributers'.pig

Also  PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as (id:chararray, name:chararray, description:chararray)

id:chararray, name:chararray, description:chararray has to be dynamically created based on the parameters passed.
Is there any way of getting this done?

-----Original Message-----
From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2013 10:48 PM
To: [EMAIL PROTECTED]
Subject: Re: reading input parameters in a pig script

what is going to be generating the "pig -param param1=..." and so on?
Couldn't these be made into arguments? ie

REGISTER /opt/apache_pig/pig-0.10.1/
contrib/piggybank/java/piggybank.jar;
REGISTER /tmp/custudf.jar;

DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor('$fields');
PRODUCTS = load 'product.xml' using
org.apache.pig.piggybank.storage.XMLLoader('product') as (line:chararray); PRODUCT = FOREACH PRODUCTS GENERATE FLATTEN(XMLProcessor(line)) as (id:chararray, name:chararray, description:chararray);
and you callit with pig -param fields=name,description

and there has to be an output format, so in that case a %default would work?
2013/2/20 Siddhi Borkar <[EMAIL PROTECTED]>

> I will not be able to use %default statement in my pig script, as the
> parameters being passed to my pig script are not fixed. I would need a
> conditional check to be done in my pig script to check for each and
> every input parameter if it is passed or not.
> Also, there are no conditional operators (if/else) available in pig .
>
> Following is the psuedocode of the functionality I want to achieve
>
> Consider  pig files:
> 1) xmlparser.pig
> 2) excelexporter.pig
> 3) htmlexporter.pig
>
> 1) xmlparser.pig
> REGISTER
> /opt/apache_pig/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
> REGISTER /tmp/custudf.jar;
>
> DEFINE XMLProcessor org.sdc.map.processor.XMLProcessor();
> PRODUCTS = load 'product.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('product') as
> (line:chararray); PRODUCT = FOREACH PRODUCTS GENERATE
> FLATTEN(XMLProcessor(line)) as (id:chararray, name:chararray,
> description:chararray);
>
> Please note, XMLProcessor is a custom java based udf which parses the xml.
>
> 2) excelexporter.pig
> STORE PRODUCT INTO '/tmp/prod.csv' USING
> CSVExcelStorage(',','NO_MULTILINE','UNIX');
>
> 3) htmlexporter.pig
> //logic for this is not yet implemented
>
> Now the requirement is that I need to write a wrapper pig script which
> invokes the following script and generates an output. The parameters
> that will be passed are the input params and the out file format
>
> For ex pig -param param1=name param2=description outfileformat=csv
> wrapper.pig
>
> Now what I need to do is based on the params passed to the wrapper pig
> script, I need to send inputs to the xml parser and parse the input params.
> In the above case since name and description are passed as params the
> xml should be parsed only for these 2 fields.
> Any idea how this can be achieved in a pig script?
>
> Also depending on the output file format, I need to invoke the
> corresponding exporter script (html or csv) from my wrapper script. I
> don’t see any conditional operators available (if/else) in pig. Any
> idea how this can be achieved?
>
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2013 2:38 PM
> To: [EMAIL PROTECTED]
> Subject: Re: reading input parameters in a pig script

DISCLAIMER
=========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.