Thanks for your comments, I found "Increasing the number of tasks increases
the framework overhead, but increases load balancing and lowers the cost of
failures." quite useful. But I'm still confued why increase block size for
large jobs will improve performance. And according to the result of my test,
while sorting 2TB data on 30 nodes cluster, increase block size from 64M to
256M would decline performance instead of improving it, could anybody tell
me why this happened?
Any comments on this? Thanks.
2010/7/22 Harsh J <[EMAIL PROTECTED]>
> This article has a few good lines that should clear that doubt of yours:
> On Thu, Jul 22, 2010 at 9:17 AM, Yu Li <[EMAIL PROTECTED]> wrote:
> > Hi all,
> > There're lots of materials from internet suggest to set dfs.block.size
> > larger, e.g. from 64M to 256M, when the job is large. And they said the
> > performance would improve. But I'm not clear why increse the block size
> > improve. I know that increase block size will reduce the map task number
> > the same input, but why lesser map tasks will improve overall
> > Any comments would be highly valued, and thanks in advance.
> > Best Regards,
> > Carp
> Harsh J