Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> reading xml file within a UDF


Copy link to this message
-
RE: reading xml file within a UDF
I do this:
 define analyze_unif `analyze_unif_recs.py`
    input  (stdin)
    output (stdout USING PigStreaming(','))
    ship   ('$scriptDir/analyze_unif_recs.py');

 UnifLines  = load '$unif_xml'
    using org.apache.pig.piggybank.storage.XMLLoader('REC')
    as (doc:chararray);
 UnifXmlByDocId = stream UnifLines through analyze_unif
          as (docid   : int,
              xml_comp: chararray
              );

where analyze_unif_recs.py is a python script I wrote that does the xml parsing, and org.apache.pig.piggybank.storage.XMLLoader('REC') finds the <REC> elements in the xml input, that are passed to my script.
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
0 +1 215 823 3853
-----Original Message-----
From: Baraa Mohamad [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 14, 2011 10:41 AM
To: [EMAIL PROTECTED]
Subject: reading xml file within a UDF

Hello,
I have a question please

How I can read a file in a UDF in pig

ex:  A = load 'xmlFiles' using myXMLParser ( xmlfile)

can I do something like that, so that I can parse the xml file using some
java library

thanks for your help

Baraa
--
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB