Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Proper blocksize and io.sort.mb setting when using compressed LZO files


Copy link to this message
-
Re: Proper blocksize and io.sort.mb setting when using compressed LZO files
Ted Yu 2010-09-27, 16:40
The default is 100MB for InMemory File System:
      int size = Integer.parseInt(conf.get("fs.inmemory.size.mb", "100"));
./src/core/org/apache/hadoop/fs/InMemoryFileSystem.java

If you want to change its value, you can put it in core-site.xml

On Mon, Sep 27, 2010 at 9:29 AM, ed <[EMAIL PROTECTED]> wrote:

> Ah okay,
>
> I did not the fs.inmemory.size.mb setting in any of the default config
> files
> located here:
>
> http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/core-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
>
> Should this be something that needs to be added?
>
> Thank you for the help!
>
> ~Ed
>
> On Mon, Sep 27, 2010 at 11:18 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > The setting should be fs.inmemory.size.mb
> >
> > On Mon, Sep 27, 2010 at 7:15 AM, pig <[EMAIL PROTECTED]> wrote:
> >
> > > HI Sriguru,
> > >
> > > Thank you for the tips.  Just to clarify a few things.
> > >
> > > Our machines have 32 GB of RAM.
> > >
> > > I'm planning on setting each machine to run 12 mappers and 2 reducers
> > with
> > > the heap size set to 2048MB so total memory usage for the heap at 28GB.
> > >
> > > If this is the case should io.sort.mb be set to 70% of 2048MB (so ~1400
> > > MB)?
> > >
> > > Also, I did not see a fs.inmemorysize.mb setting in any of the hadoop
> > > configuration files.  Is that the correct setting I should be looking
> > for?
> > > Should this also be set to 70% of the heap size or does it need to
> share
> > > with the io.sort.mb setting.
> > >
> > > I assume if I'm bumping up io.sort.mb that much I also need to increase
> > > io.sort.factor from the default of 10.  Is there a recommended relation
> > > between these two?
> > >
> > > Thank you for your help!
> > >
> > > ~Ed
> > >
> > > On Sun, Sep 26, 2010 at 3:05 AM, Srigurunath Chakravarthi <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Ed,
> > > >  Tuning io.sort.mb will be certainly worthwhile if you have enough
> RAM
> > to
> > > > allow for a higher Java heap per map task without risking swapping.
> > > >
> > > >  Similarly, you can decrease spills on the reduce side using
> > > > fs.inmemorysize.mb.
> > > >
> > > > You can use the following thumb rules for tuning those two:
> > > >
> > > > - Set these to ~70% of Java heap size. Pick heap sizes to utilize
> ~80%
> > > RAM
> > > > across all processes (maps, reducers, TT, DN, other)
> > > > - Set it small enough to avoid swap activity, but
> > > > - Set it large enough to minimize disk spills.
> > > > - Ensure that io.sort.factor is set large enough to allow full use of
> > > > buffer space.
> > > > - Balance space for output records (default 95%) & record meta-data
> > (5%).
> > > > Use io.sort.spill.percent and io.sort.record.percent
> > > >
> > > >  Your mileage may vary. We've seen job exec time improvements worth
> > 1-3%
> > > > via spill-avoidance for miscellaneous applications.
> > > >
> > > >  Your other option of running a map per 32MB or 64MB of input should
> > give
> > > > you better performance if your map task execution time is significant
> > > (i.e.,
> > > > much larger than a few seconds) compared to the overhead of launching
> > map
> > > > tasks and reading input.
> > > >
> > > > Regards,
> > > > Sriguru
> > > >
> > > > >-----Original Message-----
> > > > >From: pig [mailto:[EMAIL PROTECTED]]
> > > > >Sent: Saturday, September 25, 2010 2:36 AM
> > > > >To: [EMAIL PROTECTED]
> > > > >Subject: Proper blocksize and io.sort.mb setting when using
> compressed
> > > > >LZO files
> > > > >
> > > > >Hello,
> > > > >
> > > > >We just recently switched to using lzo compressed file input for our
> > > > >hadoop
> > > > >cluster using Kevin Weil's lzo library.  The files are pretty
> uniform
> > > > >in
> > > > >size at around 200MB compressed.  Our block size is 256MB.
> > > > >Decompressed the
> > > > >average LZO input file is around 1.0GB.  I noticed lots of our jobs
> > are