Even for a single machine (and there may be reasons to use a single machine
if the original data is not splittable) Our experience suggests it should
take about an hour to process 32 GB on a single machine leading me to wonder
whether writing the Sequence file is your limiting step - Consider very
simple job which writes 32 GB of random data - say a Long count and a random
double to a Sequence file and run it on one box (you might also try the same
steps without the write) and see if you are really being limited by the
You might also consider compression while writing the sequence file
2011/5/12 丛林 <[EMAIL PROTECTED]>
> Dear Harsh,
> Will you please explain how to create a sequence file in the way of
> Suppose that all 32G little file stored in one PC.
> Thanks for your suggestion.
> BTW: I notice that you repeated most of the topic of sequence file in
> this mail-list :-)
> Best Wishes,
> 2011/5/12 Harsh J <[EMAIL PROTECTED]>:
> > Are you doing this as a MapReduce job or is it a simple linear
> > program? MapReduce could be much faster (Combined-files input format,
> > with a few Reducers for merging if you need that as well).
> > On Thu, May 12, 2011 at 5:18 AM, 丛林 <[EMAIL PROTECTED]> wrote:
> >> Hi, all.
> >> I want to write lots of little files (32GB) to HDFS as
> >> org.apache.hadoop.io.SequenceFile.
> >> But now it is too slow: we use about 8 hours to create this
> >> SequenceFile (6.7GB).
> >> So I wonder how to create this SequenceFile more faster?
> >> Thanks for your suggestion.
> >> -Best Wishes,
> >> -Lin
> > --
> > Harsh J
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033