Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - XML -> Pig UDF


+
Russell Jurney 2012-12-24, 07:24
Copy link to this message
-
Re: XML -> Pig UDF
Vitalii Tymchyshyn 2012-12-24, 08:09
I was doing such a thing in my previous project, but I did parse on demand.
What I mean is that I've created set of xml-processing functions, each can
take a string or Dom on input plus explicit parse function.
I did this because I was usually using concatenation/grouping on parsed
input files and processing was done only after that. Or processing can be
done in another MR step and serialization of string is much easier than of
Dom.
24 груд. 2012 09:24, "Russell Jurney" <[EMAIL PROTECTED]> напис.

> I want to extend the existing XMLLoader to go beyond capturing the text
> inside a tag and to actually create a Pig mapping of the Document Object
> Model the XML represents. This would be similar to elephant-bird's
> JsonLoader.
>
> For instance, check this example: https://gist.github.com/4368194
>
> Semi-structured data can vary, so this behavior can be risky but... I want
> people to be able to load JSON and XML data easily their first session with
> Pig.
>
> Russell Jurney http://datasyndrome.com
>
+
Russell Jurney 2012-12-24, 08:13
+
Vitalii Tymchyshyn 2012-12-29, 23:00