Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Parsing xml for user defined attributes


Copy link to this message
-
Parsing xml for user defined attributes
I have a requirement to parse an xml and generate columns based on parameters specified by the user to the pig script.

For eg,  consider the following xml
<school>
                <students>
                                <student>
                                                <name>test</test>
                                                <rno>1</rno>
                                                <rank>3</rank>
                                </student>
                                <student>
                                                <name>xyz</test>
                                                <rno>3</rno>
                                                <rank>2</rank>
                                </student>
                <students>
</school>

My requirement is to parse the xml and generate the attributes depending on the field names specified by the user.
For eg, if the user specifies the field name as 'name|rno' , the parser should parse the xml and return a tuple containing name and rno.

I am using XML Loader to parse the xml up to student and then have written a java UDF to parse the student xml.
I tried to define a parameterized constructor in my java UDF class wherein I pass the columns/ attributes to be parsed.
I have then overridden the outputSchema(Schema input) method , in which I fetch the column names and add new field schema.

However this does not work the way expected. Is there any way of getting this done?
DISCLAIMER
=========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.