Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Single Reducer Times Out During Shuffle


Copy link to this message
-
Re: Single Reducer Times Out During Shuffle
On 06/17/2011 08:50 AM, Shaun Martinec wrote:
> I have a MapReduce job that is failing occasionally at the reduce
> phase. I believe it's during the shuffle phase, but am not positive. I
> have copied the end of the job log below. As you can see, I have a
> very large number of maps (2910) and only 1 reducer that is used to
Why are you using a single reducer for this?
> cat the results together (/bin/cat) into a single output file. I have
> tried increasing "mapred.reduce.parallel.copies" from 5 to 10, but it
> still fails in the same manner. I thought of building a reducer (to
> replace /bin/cat) that outputs a status every 10,000 rows, but wasn't
> sure if that would have any effect on the shuffle phase.
You can test this option, don't asume anything yet
>   The short
> task timeout (25 seconds) is necessary for the mappers and cannot be
> changed, unfortunately. I've run out of knowledge at this point and
> would appreciate any additional insight or solutions. Thanks.
>
> -Shaun
>
> We are running Hadoop 0.20 using Amazon Elastic MapReduce.
>
> ...
> MapAttempt TASK_TYPE="MAP" TASKID="task_201106170304_0001_m_002847"
> TASK_ATTEMPT_ID="attempt_201106170304_0001_m_002847_1"
> START_TIME="1308295418235"
> TRACKER_NAME="tracker_ip-10-245-135-148\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:50801"
> HTTP_PORT="9103" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_201106170304_0001_m_002847"
> TASK_ATTEMPT_ID="attempt_201106170304_0001_m_002847_1"
> TASK_STATUS="SUCCESS" FINISH_TIME="1308295436505"
> HOSTNAME="/default-rack/ip-10-245-135-148\.ec2\.internal"
> STATE_STRING="OK: garage parking aid"
> COUNTERS="{(SkippingTaskCounters)(SkippingTaskCounters)[(MapProcessedRecords)(MapProcessedRecords)(10)]}{(FileSystemCounters)(FileSystemCounters)[(S3N_BYTES_READ)(S3N_BYTES_READ)(6821)][(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(49594)]}{(Custom)(Custom)[(Terms
> crawled \\(Google\\))(Terms crawled
> \\(Google\\))(10)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce
> Framework)[(COMBINE_OUTPUT_RECORDS)(Combine output
> records)(0)][(MAP_INPUT_RECORDS)(Map input
> records)(10)][(SPILLED_RECORDS)(Spilled
> Records)(370)][(MAP_OUTPUT_BYTES)(Map output
> bytes)(127806)][(MAP_INPUT_BYTES)(Map input
> bytes)(1339)][(COMBINE_INPUT_RECORDS)(Combine input
> records)(0)][(MAP_OUTPUT_RECORDS)(Map output records)(370)]}" .
> Task TASKID="task_201106170304_0001_m_002847" TASK_TYPE="MAP"
> TASK_STATUS="SUCCESS" FINISH_TIME="1308295439373"
> COUNTERS="{(SkippingTaskCounters)(SkippingTaskCounters)[(MapProcessedRecords)(MapProcessedRecords)(10)]}{(FileSystemCounters)(FileSystemCounters)[(S3N_BYTES_READ)(S3N_BYTES_READ)(6821)][(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(49594)]}{(Custom)(Custom)[(Terms
> crawled \\(Google\\))(Terms crawled
> \\(Google\\))(10)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce
> Framework)[(COMBINE_OUTPUT_RECORDS)(Combine output
> records)(0)][(MAP_INPUT_RECORDS)(Map input
> records)(10)][(SPILLED_RECORDS)(Spilled
> Records)(370)][(MAP_OUTPUT_BYTES)(Map output
> bytes)(127806)][(MAP_INPUT_BYTES)(Map input
> bytes)(1339)][(COMBINE_INPUT_RECORDS)(Combine input
> records)(0)][(MAP_OUTPUT_RECORDS)(Map output records)(370)]}" .
> ReduceAttempt TASK_TYPE="REDUCE"
> TASKID="task_201106170304_0001_r_000000"
> TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_0"
> START_TIME="1308294529811"
> TRACKER_NAME="tracker_ip-10-112-62-154\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:34540"
> HTTP_PORT="9103" .
> ReduceAttempt TASK_TYPE="REDUCE"
> TASKID="task_201106170304_0001_r_000000"
> TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_0"
> TASK_STATUS="FAILED" FINISH_TIME="1308295482933"
> HOSTNAME="ip-10-112-62-154\.ec2\.internal" ERROR="Task
> attempt_201106170304_0001_r_000000_0 failed to report status for 27
> seconds\. Killing!" .
> ReduceAttempt TASK_TYPE="REDUCE"
> TASKID="task_201106170304_0001_r_000000"
> TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_1"
> START_TIME="1308295486373"
> TRACKER_NAME="tracker_ip-10-85-67-66\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:36502"
It this the JobTracker's log?
Regards

Marcos Lu�s Ort�z Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186