Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Writing small files to one big file in hdfs


Copy link to this message
-
Re: Writing small files to one big file in hdfs
Hi Mohit
      AFAIK XMLLoader in pig won't be suited for Sequence Files. Please
post the same to Pig user group for some workaround over the same.
         SequenceFIle is a preferred option when we want to store small
files in hdfs and needs to be processed by MapReduce as it stores data in
key value format.Since SequenceFileInputFormat is available at your
disposal you don't need any custom input formats for processing the same
using map reduce. It is a cleaner and better approach compared to just
appending small xml file contents into a big file.

On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
>
> > Mohit
> >       Rather than just appending the content into a normal text file or
> > so, you can create a sequence file with the individual smaller file
> content
> > as values.
> >
> >  Thanks. I was planning to use pig's
> org.apache.pig.piggybank.storage.XMLLoader
> for processing. Would it work with sequence file?
>
> This text file that I was referring to would be in hdfs itself. Is it still
> different than using sequence file?
>
> > Regards
> > Bejoy.K.S
> >
> > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > We have small xml files. Currently I am planning to append these small
> > > files to one file in hdfs so that I can take advantage of splits,
> larger
> > > blocks and sequential IO. What I am unsure is if it's ok to append one
> > file
> > > at a time to this hdfs file
> > >
> > > Could someone suggest if this is ok? Would like to know how other do
> it.
> > >
> >
>