Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Map succeeds but reduce hangs


Copy link to this message
-
Re: Map succeeds but reduce hangs
Vinod Kumar Vavilapalli 2014-01-02, 18:28
Check the TaskTracker configuration in mapred-site.xml: mapred.task.tracker.report.address. You may be setting it to 127.0.0.1:0 or localhost:0. Change it to 0.0.0.0:0 and restart the daemons.

Thanks,
+Vinod

On Jan 1, 2014, at 2:14 PM, navaz <[EMAIL PROTECTED]> wrote:

> I dont know y it is running on localhost. I have commented it.
> =================================================================> slave1:
> Hostname: pc321
>
> hduser@pc321:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28    pc228
> 155.98.39.121   pc321
> 155.98.39.27    dn3.myslice.ch-geni-net.emulab.net
> =======================================================================> slave2:
> hostname: dn3.myslice.ch-geni-net.emulab.net
> hduser@dn3:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28    pc228
> 155.98.39.121   pc321
> 155.98.39.27    dn3.myslice.ch-geni-net.emulab.net
> =======================================================================> Master:
> Hostame: pc228
> hduser@pc228:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28   pc228
> 155.98.39.121  pc321
> #155.98.39.19   slave2
> 155.98.39.27   dn3.myslice.ch-geni-net.emulab.net
> ===========================================================================> I have replaced localhost with pc228 in coresite.xml and mapreduce-site.xml and replication factor as 3.
>
> I can able to ssh pc321 and dn3.myslice.ch-geni-net.emulab.net from master.
>
>
> hduser@pc228:/usr/local/hadoop/conf$ more slaves
> pc228
> pc321
> dn3.myslice.ch-geni-net.emulab.net
>
> hduser@pc228:/usr/local/hadoop/conf$ more masters
> pc228
> hduser@pc228:/usr/local/hadoop/conf$
>
>
>
> Am i am doing anything wrong here ?
>
>
> On Wed, Jan 1, 2014 at 4:54 PM, Hardik Pandya <[EMAIL PROTECTED]> wrote:
> do you have your hosnames properly configured in etc/hosts? have you tried 192.168.?.? instead of localhost 127.0.0.1
>
>
>
> On Wed, Jan 1, 2014 at 11:33 AM, navaz <[EMAIL PROTECTED]> wrote:
> Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ?
>
> Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig is succeded for these nodes).
>
> Does it looks for hostname only for reduce ?
>
>
> 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001
> 14/01/01 09:09:39 INFO mapred.JobClient:  map 0% reduce 0%
> 14/01/01 09:10:00 INFO mapred.JobClient:  map 33% reduce 0%
> 14/01/01 09:10:01 INFO mapred.JobClient:  map 66% reduce 0%
> 14/01/01 09:10:05 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:10:14 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:17:32 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:17:46 INFO mapred.JobClient:  map 100% reduce 11%
> 14/01/01 09:17:50 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:25:06 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_1, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:25:34 INFO mapred.JobClient:  map 100% reduce 100%
> 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: job_201401010908_0001
> 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29
>
>
>
> Job Tracker logs:
> 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000002_0' has completed task_20140
> 1010908_0001_m_000002 successfully.
> 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000001_0' has completed task_20140
> 1010908_0001_m_000001 successfully.
> 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201401010908_0001_r_000000_0: Shuffle Error: Exc
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.