Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> Why In-memory Mapoutput is necessary in ReduceCopier

Ling Kun 2013-03-11, 10:27
Ravi Prakash 2013-03-11, 16:58
Copy link to this message
Re: Why In-memory Mapoutput is necessary in ReduceCopier
Dear Ravi and all,

   Thanks very much for your kindly reply.
   I am currently concern about whether it is possible to eliminate the
HTTP GET method using some other ways. And currently have not got a better

   Thanks agin.

Ling Kun
On Tue, Mar 12, 2013 at 12:58 AM, Ravi Prakash <[EMAIL PROTECTED]> wrote:

> Hi Ling,
> Yes! It is because of performance concerns. We want to keep and merge map
> outputs in memory as much as we can. The amount of memory reserved for this
> purpose is configurable. Obviously storing fetched map outputs on disk,
> then reading them back from disk to merge them and then write out back to
> disk, is a lot more expensive than if it were done in memory.
> Please let us know if you find there was an opportunity to keep the map
> output in memory but we did not, and instead shuffled to disk.
> Thanks
> Ravi
> ________________________________
>  From: Ling Kun <[EMAIL PROTECTED]>
> Sent: Monday, March 11, 2013 5:27 AM
> Subject: Why In-memory Mapoutput is necessary in ReduceCopier
> Dear all,
>      I am focusing on the Mapoutput copier implementation. This part of
> code will try to get mapoutputs, and merge them into a file that can feed
> to reduce functions. I have the following questions.
> 1. All the local file mapoutput data will be merged together by the
> LocalFSMerge, and the in-memory mapout will be merged by
> InMemFSMergeThread. For the InMemFSMergeThread, there is also a writer
> object   which write the result to outputPath ( ReduceTask.java Line 2843).
> It seems after merging, in-memory mapoutput and local file mapoutput data
> will all be stored in local file system. Why not just using the local file
> for all mapoutput data.
> 2. After using http to get  some fragment of a map output file, some of the
> mapoutput data will be selected and keep in memory, while others are
> directly write to local disk of reducers. Which mapoutput wil be kept in
> memory is determined in MapOutputCopier.getMapOutput(), this method will
> call ramManager.canFitInMemory().  why not store all the data to disk?
> 3. According to the comment, Hadoop will put a file in memory if it meets:
> a, the size of the (decompressed) file should be less than 25% of the total
> inmem fs; b, there is space available in the inmem fs. Why ? Is it because
> of the performance?
> Thanks
> yours,
> Ling Kun
> --
> http://www.lingcc.com

Harsh J 2013-03-18, 05:40