Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> xml parsing issue


Copy link to this message
-
xml parsing issue
Hi all,

I HAVE XML FILE LIKE THIS:

<CATALOG>
<CD>
<TITLE>hadoop developer</TITLE>
<ARTIST>ajay</ARTIST>
<COUNTRY>india</COUNTRY>
<COMPANY>ITC</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>2013</YEAR>
</CD>
</CATALOG>
THIS IS MY PIG SCRIPT:

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
chararray);

B = foreach A GENERATE
FLATTEN(REGEX_EXTRACT_ALL(x,'<CATALOG>\n<CD>\\n<TITLE>(.*)</TITLE>\n<ARTIST>(.*)</ARTIST>\n<COUNTRY>(.*)</COUNTRY>\n<COMPANY>(.*)<COMPANY>\n<PRICE>(.*)</PRICE>\n<YEAR>(.*)</YEAR>\n</CD>\n</CATALOG>'))
as (id: int, name:chararray);
EXPECTED OUTPUT:
hadoop developer|ajay|india|ITC|10.90|2013
but

getting output like:

()

()

()
what is wrong???
--
*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*