-Re: is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?
After search the hadoop maillist again, I found this link which trying to
optimize hadoop based on Lustre using Hardlink instead of http(
Any other suggestion ?
On Thu, Feb 28, 2013 at 4:57 PM, Ling Kun <[EMAIL PROTECTED]> wrote:
> Dear Arun C Murthy, Pavan Kulkarni and all.
> I am currently working on optimize Hadoop cluster based on Lustre FS.
> According to the TeraSort Benchmark, it seems the remote mapoutput copy
> takes a great part of the total runtime.
> After search , I saw your discussion half a years ago (
> http://search-hadoop.com/m/jj3y46KUwC1 ).
> I am writing to wonder whether we can make reducer directly read
> his part of each mapout file based on index file, and merge them together,
> instead of making each map task generate output for each reduce task.
> In this way, it seems that not too much inode is needed.
> @Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please
> kindly share some experience on how to optimize such a kind of FileSystem
> like lustre?
> Anyone have similar work experience?
> Any comment and reply is welcome and appreciate!
> Ling Kun.