Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> XML files and Sequencefile

Copy link to this message
Re: XML files and Sequencefile
Have you looked at SequenceFileLoader for Pig?

On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak <[EMAIL PROTECTED]> wrote:

> Hi There,
> I have a lot of small (~0.5 MB to 3 MB) XML files that I would like to
> process using Apache Pig. Since dealing with a lot of small files is
> problematic , I was thinking of creating SeqeunceFiles such that each
> sequence file between 60 to 64 MB and no XML file is split onto 2 Sequence
> Files. Is there any utility that does the storing and loading of these
> files from Pig. I can for example create a Pig job that would read these
> XML files and generates few large sequence files  such that XML file is
> split onto 2 Sequence Files. I will then write another Pig job that will
> load these sequence files and then analyze them. Each of these XML files
> contains a lot of information for a given entity and the nesting can be
> quite deep. Any help with this would be great.