Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Question about dfs.block.size setting

Copy link to this message
Re: Question about dfs.block.size setting
Hi Harsh,

Thanks for your comments, I found "Increasing the number of tasks increases
the framework overhead, but increases load balancing and lowers the cost of
failures." quite useful. But I'm still confued why increase block size for
large jobs will improve performance. And according to the result of my test,
while sorting 2TB data on 30 nodes cluster, increase block size from 64M to
256M would decline performance instead of improving it, could anybody tell
me why this happened?

Any comments on this? Thanks.

Best Regards,

2010/7/22 Harsh J <[EMAIL PROTECTED]>

> This article has a few good lines that should clear that doubt of yours:
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
> On Thu, Jul 22, 2010 at 9:17 AM, Yu Li <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > There're lots of materials from internet suggest to set dfs.block.size
> > larger, e.g. from 64M to 256M, when the job is large. And they said the
> > performance would improve. But I'm not clear why increse the block size
> will
> > improve. I know that increase block size will reduce the map task number
> for
> > the same input, but why lesser map tasks will improve overall
> performance?
> >
> > Any comments would be highly valued, and thanks in advance.
> >
> > Best Regards,
> > Carp
> >
> --
> Harsh J
> www.harshj.com