Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - can't parse the values using XML loader

Copy link to this message
RE: can't parse the values using XML loader
william.dowling@... 2013-08-21, 16:19
Part of the problem might be that the regexp has


but you need

Using regexps to parse XML is awfully brittle. An alternative is to use a UDF that calls out to an XML parser. I use ElementTree from python UDFs.

Will Dowling

From: Muni mahesh [[EMAIL PROTECTED]]
Sent: Wednesday, August 21, 2013 6:58 AM
Subject: can't parse the values using XML loader

*Input file :*

<TITLE>hadoop developer</TITLE>
===========================================================================================================================================*Pig Script:*

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
B = foreach A GENERATE
as (id: int, name:chararray);
*Output Expected :*

(hadoop, ajay, india, ITC, 10.90, 2013)

*Issue :


But the output i am getting is :*



*I hope it is not able to parse the values between the tags