|
|
-
Re: Single Reducer Times Out During ShuffleMarcos Ortiz 2011-06-17, 14:29
On 06/17/2011 08:50 AM, Shaun Martinec wrote:
> I have a MapReduce job that is failing occasionally at the reduce > phase. I believe it's during the shuffle phase, but am not positive. I > have copied the end of the job log below. As you can see, I have a > very large number of maps (2910) and only 1 reducer that is used to Why are you using a single reducer for this? > cat the results together (/bin/cat) into a single output file. I have > tried increasing "mapred.reduce.parallel.copies" from 5 to 10, but it > still fails in the same manner. I thought of building a reducer (to > replace /bin/cat) that outputs a status every 10,000 rows, but wasn't > sure if that would have any effect on the shuffle phase. You can test this option, don't asume anything yet > The short > task timeout (25 seconds) is necessary for the mappers and cannot be > changed, unfortunately. I've run out of knowledge at this point and > would appreciate any additional insight or solutions. Thanks. > > -Shaun > > We are running Hadoop 0.20 using Amazon Elastic MapReduce. > > ... > MapAttempt TASK_TYPE="MAP" TASKID="task_201106170304_0001_m_002847" > TASK_ATTEMPT_ID="attempt_201106170304_0001_m_002847_1" > START_TIME="1308295418235" > TRACKER_NAME="tracker_ip-10-245-135-148\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:50801" > HTTP_PORT="9103" . > MapAttempt TASK_TYPE="MAP" TASKID="task_201106170304_0001_m_002847" > TASK_ATTEMPT_ID="attempt_201106170304_0001_m_002847_1" > TASK_STATUS="SUCCESS" FINISH_TIME="1308295436505" > HOSTNAME="/default-rack/ip-10-245-135-148\.ec2\.internal" > STATE_STRING="OK: garage parking aid" > COUNTERS="{(SkippingTaskCounters)(SkippingTaskCounters)[(MapProcessedRecords)(MapProcessedRecords)(10)]}{(FileSystemCounters)(FileSystemCounters)[(S3N_BYTES_READ)(S3N_BYTES_READ)(6821)][(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(49594)]}{(Custom)(Custom)[(Terms > crawled \\(Google\\))(Terms crawled > \\(Google\\))(10)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce > Framework)[(COMBINE_OUTPUT_RECORDS)(Combine output > records)(0)][(MAP_INPUT_RECORDS)(Map input > records)(10)][(SPILLED_RECORDS)(Spilled > Records)(370)][(MAP_OUTPUT_BYTES)(Map output > bytes)(127806)][(MAP_INPUT_BYTES)(Map input > bytes)(1339)][(COMBINE_INPUT_RECORDS)(Combine input > records)(0)][(MAP_OUTPUT_RECORDS)(Map output records)(370)]}" . > Task TASKID="task_201106170304_0001_m_002847" TASK_TYPE="MAP" > TASK_STATUS="SUCCESS" FINISH_TIME="1308295439373" > COUNTERS="{(SkippingTaskCounters)(SkippingTaskCounters)[(MapProcessedRecords)(MapProcessedRecords)(10)]}{(FileSystemCounters)(FileSystemCounters)[(S3N_BYTES_READ)(S3N_BYTES_READ)(6821)][(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(49594)]}{(Custom)(Custom)[(Terms > crawled \\(Google\\))(Terms crawled > \\(Google\\))(10)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce > Framework)[(COMBINE_OUTPUT_RECORDS)(Combine output > records)(0)][(MAP_INPUT_RECORDS)(Map input > records)(10)][(SPILLED_RECORDS)(Spilled > Records)(370)][(MAP_OUTPUT_BYTES)(Map output > bytes)(127806)][(MAP_INPUT_BYTES)(Map input > bytes)(1339)][(COMBINE_INPUT_RECORDS)(Combine input > records)(0)][(MAP_OUTPUT_RECORDS)(Map output records)(370)]}" . > ReduceAttempt TASK_TYPE="REDUCE" > TASKID="task_201106170304_0001_r_000000" > TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_0" > START_TIME="1308294529811" > TRACKER_NAME="tracker_ip-10-112-62-154\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:34540" > HTTP_PORT="9103" . > ReduceAttempt TASK_TYPE="REDUCE" > TASKID="task_201106170304_0001_r_000000" > TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_0" > TASK_STATUS="FAILED" FINISH_TIME="1308295482933" > HOSTNAME="ip-10-112-62-154\.ec2\.internal" ERROR="Task > attempt_201106170304_0001_r_000000_0 failed to report status for 27 > seconds\. Killing!" . > ReduceAttempt TASK_TYPE="REDUCE" > TASKID="task_201106170304_0001_r_000000" > TASK_ATTEMPT_ID="attempt_201106170304_0001_r_000000_1" > START_TIME="1308295486373" > TRACKER_NAME="tracker_ip-10-85-67-66\.ec2\.internal:localhost\.localdomain/127\.0\.0\.1:36502" It this the JobTracker's log? Regards Marcos Lu�s Ort�z Valmaseda Software Engineer (UCI) http://marcosluis2186.posterous.com http://twitter.com/marcosluis2186 |