Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> intermediate results files

John Lilley 2013-07-01, 22:46
Mohammad Tariq 2013-07-01, 23:02
Copy link to this message
RE: intermediate results files
I've seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs at about 33MB/sec, but I can't seem to find that now.

From: Mohammad Tariq [mailto:[EMAIL PROTECTED]]
Sent: Monday, July 01, 2013 5:03 PM
Subject: Re: intermediate results files

Hello John,

      IMHO, it doesn't matter. Your job will write the result just once. Replica creation is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing at the same speed.

Warm Regards,

On Tue, Jul 2, 2013 at 4:16 AM, John Lilley <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
If my reducers are going to create results that are temporary in nature (consumed by the next processing stage) is it recommended to use a replication factor <3 to improve performance?
Mohammad Tariq 2013-07-02, 00:34
John Lilley 2013-07-02, 15:39
Devaraj k 2013-07-02, 06:00