Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> can't parse the values using XML loader

Copy link to this message
RE: can't parse the values using XML loader
Part of the problem might be that the regexp has


but you need

Using regexps to parse XML is awfully brittle. An alternative is to use a UDF that calls out to an XML parser. I use ElementTree from python UDFs.

Will Dowling

From: Muni mahesh [[EMAIL PROTECTED]]
Sent: Wednesday, August 21, 2013 6:58 AM
Subject: can't parse the values using XML loader

*Input file :*

<TITLE>hadoop developer</TITLE>
===========================================================================================================================================*Pig Script:*

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
B = foreach A GENERATE
as (id: int, name:chararray);
*Output Expected :*

(hadoop, ajay, india, ITC, 10.90, 2013)

*Issue :


But the output i am getting is :*



*I hope it is not able to parse the values between the tags