Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig and XML parsing


+
Sameer Tilak 2013-10-17, 23:08
Copy link to this message
-
Re: Pig and XML parsing
ajay kumar 2013-10-18, 03:54
how about this,
A = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('property
') as (variable: datatype);
On Fri, Oct 18, 2013 at 4:38 AM, Sameer Tilak <[EMAIL PROTECTED]> wrote:

> Hi All,
> I have a lot of small (~2 to 3 MB) XML files that I would like to process.
> I was thinking along the following lines, please let me know if you have
> any thoughts on this.
>
> 1. Create SeqeunceFiles such that each sequence file between 60 to 64 MB
> and no XML file is split onto 2 Sequence Files.
> 2. Write Pig Script to that loads the sequence file, then iterates over
> individual XML files and analyzes them.
> I was planning to use Elephant-Bird to read sequencefiles. Here is what
> their documentation says:
> Hadoop SequenceFiles and Pig
>
> Reading and writing Hadoop SequenceFiles with Pig is supported via classes
> SequenceFileLoader
> and
> SequenceFileStorage. These
> classes make use of a
> WritableConverter
> interface, allowing pluggable conversion of key and value instances to and
> from
> Pig data types.
>
>
> Here's a short example: Suppose you have SequenceFile<Text, LongWritable>
> data
> sitting beneath path input. We can load that data with the following Pig
> script:
>
>
> REGISTER '/path/to/elephant-bird.jar';
>
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare LONG_CONVERTER
> 'com.twitter.elephantbird.pig.util.LongWritableConverter';
>
> pairs = LOAD 'input' USING $SEQFILE_LOADER (
>   '-c $TEXT_CONVERTER', '-c $LONG_CONVERTER'
> ) AS (key: chararray, value: long);
>
>
> I was looking at XMLLoader from piggybank. Has anyone used XPATH queries
> in their Pig scripts?
>
--
*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*