Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> reading xml file within a UDF

Baraa Mohamad 2011-09-14, 14:41
Copy link to this message
RE: reading xml file within a UDF
I do this:
 define analyze_unif `analyze_unif_recs.py`
    input  (stdin)
    output (stdout USING PigStreaming(','))
    ship   ('$scriptDir/analyze_unif_recs.py');

 UnifLines  = load '$unif_xml'
    using org.apache.pig.piggybank.storage.XMLLoader('REC')
    as (doc:chararray);
 UnifXmlByDocId = stream UnifLines through analyze_unif
          as (docid   : int,
              xml_comp: chararray

where analyze_unif_recs.py is a python script I wrote that does the xml parsing, and org.apache.pig.piggybank.storage.XMLLoader('REC') finds the <REC> elements in the xml input, that are passed to my script.
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Baraa Mohamad [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 14, 2011 10:41 AM
Subject: reading xml file within a UDF

I have a question please

How I can read a file in a UDF in pig

ex:  A = load 'xmlFiles' using myXMLParser ( xmlfile)

can I do something like that, so that I can parse the xml file using some
java library

thanks for your help

Baraa Mohamad 2011-09-14, 15:26