Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> 600s timeout during copy phase of job


Copy link to this message
-
600s timeout during copy phase of job
I have a job that's getting 600s task timeouts during the copy phase of the
reduce step. I see a lot of copy tasks all moving at about 2.5MB/sec, and
it's taking longer than 10 min to do that copy.

 

The process starts copying when the reduce step is 80% complete. This is a
very IO bound task as I'm just joining 1.5TB of data via 2 map/reduce steps
on 6 nodes (each node has 1x 4TB disk, and 24GB of ram).

 

What should I be thinking in terms of fixing this?

.         Increase timeout? (seems odd that it would timeout on the internal
copy)

.         Reduce # tasks? (I've got 8 reducers, 1-per-core, 25
io.sort.factor & 256 io.sort.mb)

o   Can I do that per job??

.         Increase copy threads?

.         Don't start the reducers until 100% complete on the mappers?