Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to create a SequenceFile more faster?


Copy link to this message
-
Re: How to create a SequenceFile more faster?
Even for a single machine (and there may be reasons to use a single machine
if the original data is not splittable) Our experience suggests it should
take about an hour to process 32 GB on a single machine leading me to wonder
whether writing the Sequence file is your limiting step - Consider very
simple job which writes 32 GB of random data - say a Long count and a random
double to a Sequence file and run it on one box (you might also try the same
steps without the write) and see if you are really being limited by the
write.
  You might also consider compression while writing the sequence file

2011/5/12 丛林 <[EMAIL PROTECTED]>

> Dear Harsh,
>
> Will you please explain how to create a sequence file in the way of
> mapreduce?
>
> Suppose that all 32G little file stored in one PC.
>
> Thanks for your suggestion.
>
> BTW: I notice that you repeated most of the topic of sequence file in
> this mail-list :-)
>
> Best Wishes,
>
> -Lin
>
>
> 2011/5/12 Harsh J <[EMAIL PROTECTED]>:
> > Are you doing this as a MapReduce job or is it a simple linear
> > program? MapReduce could be much faster (Combined-files input format,
> > with a few Reducers for merging if you need that as well).
> >
> > On Thu, May 12, 2011 at 5:18 AM, 丛林 <[EMAIL PROTECTED]> wrote:
> >> Hi, all.
> >>
> >> I want to write lots of little files (32GB) to HDFS as
> >> org.apache.hadoop.io.SequenceFile.
> >>
> >> But now it is too slow: we use about 8 hours to create this
> >> SequenceFile (6.7GB).
> >>
> >> So I wonder how to create this SequenceFile more faster?
> >>
> >> Thanks for your suggestion.
> >>
> >> -Best Wishes,
> >>
> >> -Lin
> >>
> >
> >
> >
> > --
> > Harsh J
> >
>

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB