Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> 600s timeout during copy phase of job


Copy link to this message
-
600s timeout during copy phase of job
I have a job that's getting 600s task timeouts during the copy phase of the
reduce step. I see a lot of copy tasks all moving at about 2.5MB/sec, and
it's taking longer than 10 min to do that copy.

 

The process starts copying when the reduce step is 80% complete. This is a
very IO bound task as I'm just joining 1.5TB of data via 2 map/reduce steps
on 6 nodes (each node has 1x 4TB disk, and 24GB of ram).

 

What should I be thinking in terms of fixing this?

.         Increase timeout? (seems odd that it would timeout on the internal
copy)

.         Reduce # tasks? (I've got 8 reducers, 1-per-core, 25
io.sort.factor & 256 io.sort.mb)

o   Can I do that per job??

.         Increase copy threads?

.         Don't start the reducers until 100% complete on the mappers?

 

 

 

 

 

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB