Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> XML files and Sequencefile


Copy link to this message
-
Re: XML files and Sequencefile
Have you looked at SequenceFileLoader for Pig?
http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html

Regards,
Shahab
On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak <[EMAIL PROTECTED]> wrote:

> Hi There,
>
> I have a lot of small (~0.5 MB to 3 MB) XML files that I would like to
> process using Apache Pig. Since dealing with a lot of small files is
> problematic , I was thinking of creating SeqeunceFiles such that each
> sequence file between 60 to 64 MB and no XML file is split onto 2 Sequence
> Files. Is there any utility that does the storing and loading of these
> files from Pig. I can for example create a Pig job that would read these
> XML files and generates few large sequence files  such that XML file is
> split onto 2 Sequence Files. I will then write another Pig job that will
> load these sequence files and then analyze them. Each of these XML files
> contains a lot of information for a given entity and the nesting can be
> quite deep. Any help with this would be great.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB