Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Proper blocksize and io.sort.mb setting when using compressed LZO files


+
pig 2010-09-24, 21:06
+
Srigurunath Chakravarthi 2010-09-26, 07:05
+
pig 2010-09-27, 14:15
+
Ted Yu 2010-09-27, 15:18
+
ed 2010-09-27, 16:29
Copy link to this message
-
Re: Proper blocksize and io.sort.mb setting when using compressed LZO files
The default is 100MB for InMemory File System:
      int size = Integer.parseInt(conf.get("fs.inmemory.size.mb", "100"));
./src/core/org/apache/hadoop/fs/InMemoryFileSystem.java

If you want to change its value, you can put it in core-site.xml

On Mon, Sep 27, 2010 at 9:29 AM, ed <[EMAIL PROTECTED]> wrote:

> Ah okay,
>
> I did not the fs.inmemory.size.mb setting in any of the default config
> files
> located here:
>
> http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/core-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
>
> Should this be something that needs to be added?
>
> Thank you for the help!
>
> ~Ed
>
> On Mon, Sep 27, 2010 at 11:18 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > The setting should be fs.inmemory.size.mb
> >
> > On Mon, Sep 27, 2010 at 7:15 AM, pig <[EMAIL PROTECTED]> wrote:
> >
> > > HI Sriguru,
> > >
> > > Thank you for the tips.  Just to clarify a few things.
> > >
> > > Our machines have 32 GB of RAM.
> > >
> > > I'm planning on setting each machine to run 12 mappers and 2 reducers
> > with
> > > the heap size set to 2048MB so total memory usage for the heap at 28GB.
> > >
> > > If this is the case should io.sort.mb be set to 70% of 2048MB (so ~1400
> > > MB)?
> > >
> > > Also, I did not see a fs.inmemorysize.mb setting in any of the hadoop
> > > configuration files.  Is that the correct setting I should be looking
> > for?
> > > Should this also be set to 70% of the heap size or does it need to
> share
> > > with the io.sort.mb setting.
> > >
> > > I assume if I'm bumping up io.sort.mb that much I also need to increase
> > > io.sort.factor from the default of 10.  Is there a recommended relation
> > > between these two?
> > >
> > > Thank you for your help!
> > >
> > > ~Ed
> > >
> > > On Sun, Sep 26, 2010 at 3:05 AM, Srigurunath Chakravarthi <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Ed,
> > > >  Tuning io.sort.mb will be certainly worthwhile if you have enough
> RAM
> > to
> > > > allow for a higher Java heap per map task without risking swapping.
> > > >
> > > >  Similarly, you can decrease spills on the reduce side using
> > > > fs.inmemorysize.mb.
> > > >
> > > > You can use the following thumb rules for tuning those two:
> > > >
> > > > - Set these to ~70% of Java heap size. Pick heap sizes to utilize
> ~80%
> > > RAM
> > > > across all processes (maps, reducers, TT, DN, other)
> > > > - Set it small enough to avoid swap activity, but
> > > > - Set it large enough to minimize disk spills.
> > > > - Ensure that io.sort.factor is set large enough to allow full use of
> > > > buffer space.
> > > > - Balance space for output records (default 95%) & record meta-data
> > (5%).
> > > > Use io.sort.spill.percent and io.sort.record.percent
> > > >
> > > >  Your mileage may vary. We've seen job exec time improvements worth
> > 1-3%
> > > > via spill-avoidance for miscellaneous applications.
> > > >
> > > >  Your other option of running a map per 32MB or 64MB of input should
> > give
> > > > you better performance if your map task execution time is significant
> > > (i.e.,
> > > > much larger than a few seconds) compared to the overhead of launching
> > map
> > > > tasks and reading input.
> > > >
> > > > Regards,
> > > > Sriguru
> > > >
> > > > >-----Original Message-----
> > > > >From: pig [mailto:[EMAIL PROTECTED]]
> > > > >Sent: Saturday, September 25, 2010 2:36 AM
> > > > >To: [EMAIL PROTECTED]
> > > > >Subject: Proper blocksize and io.sort.mb setting when using
> compressed
> > > > >LZO files
> > > > >
> > > > >Hello,
> > > > >
> > > > >We just recently switched to using lzo compressed file input for our
> > > > >hadoop
> > > > >cluster using Kevin Weil's lzo library.  The files are pretty
> uniform
> > > > >in
> > > > >size at around 200MB compressed.  Our block size is 256MB.
> > > > >Decompressed the
> > > > >average LZO input file is around 1.0GB.  I noticed lots of our jobs
> > are
+
Srigurunath Chakravarthi 2010-09-27, 16:56
+
ed 2010-09-28, 18:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB